~~ Va SlCr ClO Oa ae * 





Spots eek pce ene 


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


a 
a 








Volume XXXI 


January, 1940 Number 1 














GRADE TRENDS IN READING PROGRESS IN 
KINDERGARTEN AND PRIMARY GRADES! 


FRANK T. WILSON 
Hunter College, New York City 


AND 


CECILE WHITE FLEMMING 
Horace Mann School, Teachers College, Columbia University 


I, INTRODUCTION 


An intensive study of twenty-five children in a beginning first grade 
of the Horace Mann School during the school year 1933-1934 showed 
that of one hundred six obtained measures of the pupils’ abilities, 
traits, and characteristics, certain abilities with letters correlated more 
highly with a variety of objective reading test scores and the teacher’s 
judgment of reading ability, than any other of the measures. These 
letter abilities were: Naming small and capital letters, giving letter 
sounds, pronouncing phonic combinations, and writing or printing 
small and capital letters and words. 

The following year a group of twenty four-year-old kindergarten 
children, another of fifty-four five-year-old kindergartners, two other 
groups of first-grade and two of second-grade pupils were given some 
of the same letter tests used the first year, and a wider variety of read- 
ing tests, to see if similar relationships would befound. In general the 





1 This report presents a minor phase of a study of reading readiness and reading 
progress in the primary grades of the Horace Mann School, Teachers College, 
Columbia University. The study has been made possible by the codperation of 
Miss Agnes Burke, teacher of Grade I, and other teachers of kindergarten and 
primary grades. It has been under the supervision of Doctor Cecile White 
Flemming, Director of Individual Development and Guidance, and of Doctor 
Rollo G. Reynolds, Principal. Prepared with assistance of the United States 
Works Progress Administration, New York City, Project Number 165-97-6172, 
Sub-project 4. 
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results verified the first year’s findings. Accordingly, the third year 
the significant letters tests and some of the reading tests were given 
once more from the four-year kindergarten level through the third 
grade, to a total of nine groups and two hundred thirty-nine children. 

All of the letter tests were given as individual tests. The same was 
true of the reading tests given to the kindergarten and beginning 
Grade I pupils. The reading tests given to the later Grade I and to the 
second and third grades were administered as group tests, excepting 
the Gates Diagnosis oral reading and word recognition tests. 

This report summarizes the findings for the three years as to the 
relationships between the letter abilities and reading tests. 


II. SOURCES OF DATA AND TECHNIQUE 


Table I shows the numbers and grades of the children who were 
tested. 


TaBLE I.—Pupiis By YEARS TEsTED, Groups, AND NUMBERS 














1933-1934 1934-1935 1935-1936 
Num- Num- | Number Num- | Number Num- 
ber of | Grade | ber of of Grades | ber of of Grades | ber of 
groups pupils} groups pupils| groups pupils 

1 I 25 |1 4-year|Kinder-| 16 |1 4-year|Kinder-| 21 
garten garten 
1 5-year|Kinder-| 54 /|1 5-year|Kinder-| 48 
garten garten 
2 I 58 /|3 I 59 
2 II 55 |2 II 56 
ere HERE cea ae III 55 
Totals..} 1 1 25 16 4 183 (9 5 239 
































Grand total 16 groups 447 pupils. 


Table II shows the reading tests used in the study and by the 
grades in which they were given. 

The symbols tests used were certain subtests of the Gates Diagnosis 
Reading Tests, or adaptations of them. Table III shows their nature 
and the grades in which they were used. 

All of the reading and symbols tests were not given to every group 
of the various grades, for the simple reason that time and facilities 
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TaBie II.—Reapine Tests Usep sy GRADES 

















Grades 
Kindergarten; I Il | Ill 

Gates Primary Reading Tests: 

Type 1, word recognition.................. x x x 

EG hited ecw eacewusceecoeeae x x 

Bee hy Baa o's boonies codbccccseces x x 
Stanford Achievement Tests: 

ene toga he henatm kak eaw 406 x x 

i  cicedis nando eee ee c's eek x x 
Progressive Achievement Tests 

I A ee Pe eee |” x x 

IEE EPO AIO FERS S x x 
Gates Diagnosis Reading Tests 

ES tna 6 hd ewewk baesada be x x 

ois, on eke geea bok beau ed x x 
Teachers Ratings: 

She Ek tika'g'ce vides 6G Aube we eee x x x x 

SEER Aer Sok ae ee ee eA OD x x x x 














TaBLeE III.—Sympots Tests Usep spy GRADES 




















Grades 

Kindergarten | I II | Ill 
Pronouncing phonic combinations............. x x x x 
EE OE PS | x x x x 
Blending letter sounds. ..................... x x x x 
Recognizing letter sounds....... pa nae is x x x 
Naming capital letters....................... x x x x 
Writing capital letters....................... x x x x 
EE x x x x 
EIN, ive ct eb we vows cvaccends | - x a 
I Be da sh a duacibindk 6 dedwane oe 0d eae x x x | x 











made it impossible to do so. Some groups were given many more 
tests than others. [It is believed, however, that the results obtained 
were sufficiently representative of the abilities of the different grades 
to indicate important facts.] 

Averages for each grade group were computed for each test. The 
results were treated, also, by the correlation technique using the 
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2 
formula of rank order differences, rho = 1 — | 7 Wi 7) The 


coefficients between each reading and each symbols test were computed 
for each group of children. For summary purposes these coefficients 
were averaged by grades. 


III. FINDINGS 


1. Averages.—Tables IV and V give the averages for the several 
groups in each of the symbols and reading tests that were used during 
the school year 1935-1936. These averages are presented in the belief 
that they represent the relative abilities of the groups more fully than 
any other set of averages. ‘The averages of every group for all tests 
given during the three years of the experiment are on file in the office 
of the Director of Individual Development and Guidance in the 
Horace Mann School. 

Table IV shows the possible scores for each of the nine symbols 
tests and the total scores for the nine tests for each grade group. The 
totals show increasing abilities from the four-year kindergarten group 


Taste 1V.—Grovup AVERAGES FOR Nine Sympous Tests By Groups aT Eacu 
Grape Lever. Sprinea, 1936 
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D a ag = = = = 3 
g | |23\z8/2 16 (2 |2 | 8 
i. — a ows "2 8 wo wo. | we to 
22 |22| 28 | 22/28 | 28/28/28) 2 | 2 
= ise ° > > > . 
a |5* le |e*|2-|2-|esle=| se | € 
ee 20 10 10 10 10 10 10 10 10 | 100 
Num- 
Grade Group | ber of 
pupils 
4-year Kinder-| C | 21 |....| 0.8) 0.8 1.9) 4.5| 4.9] 3.7] 0.7| 2.5]19.8 
garten 
5-year Kinder- B 48 0.2; 2.8) 0.8 3.0) 6.9) 4.6) 5.8) 1.1) 4.1/29.3 
garten 
I A 23 6.5) 5.9; 2.0) 6.4) 9.3) 8.0) 8.5) 7.5) 7.7161.8 
I F 26 6.9) 5.5) 3.5) 6.5 a 8.6) 8.3) 6.5) 7.3)62.0 
I G 10 3.6) 4.9) 1.2) 5.4) 7.8) 8.5) 6.9) 6.3) 8.4'53.0 
II D 28 7.3) 7.4) 2.3) 7.2)10.0)10.0:10.0| 9.1110.0/73.3 
II E 28 9.2) 6.8) 2.7) 8.2:10.0)10.0) 9.7) 9.1110.0/75.7 
Ill H 29 (11.6) 6.5) 2.7) 8.2) 9.9) 9.9) 9.7) 9.4/10.0/77.9 
Ill I 26 (14.7; 7.9) 4.0) 9.5|10.0) 9.8) 9.7) 9.5) 9.9/85.0 
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through Grade III. The results for the subtests, however, indicate 
that abilities varied with the tests at all grade levels. For example, 
in the four-year kindergarten group, an average score of 0.8 was made 
on giving letter sounds, and one of 4.9 on naming capital letters. In 
Grade II, group D, the average in recognizing short vowel sounds was 
2.3, while naming the small and capital letters and writing capital 
letters and digits averaged 10 each. While groups in the same grade 
varied considerably on certain tests, there was more commonly quite 
close agreement. In general, the trend was one of increase in score in 
each test from grade to grade. 

The table also shows that certain of the abilities measured reached 
high levels earlier than others. In at least three instances the abilities 
were not yet mastered in the third grade. The lowest of the three was 
that of recognizing the short vowel sounds, which had averages of 
2.7 and 4.0 for the two third grades. The other two unperfected 
abilities were giving phonic combinations and giving letter sounds. 
In contrast to these abilities at least four others showed averages in 
Grade II equal to the possible scores; namely, naming the small and 
capital letters and writing the capital letters and digits. The compara- 
tive easiness of these four tests and the difficulty of the former three are 
evident throughout the entire group of classes. One ability—writing 
small letters—seemed little developed in the kindergarten groups, but 
much improved in Grade I. It was not perfected, however, even in 
Grade III, due in large part to reversal confusions of the d and b, and 
the p and q by some children. 

In summary it seems that knowledge of letter forms appeared 
earliest and was perfected by Grade III, as evidenced by ability to 
name capital and small letters and to write the capital letters. Ability 
to write the digits was developed by Grade II. Giving phonic com- 
binations, and giving and recognizing letter sounds apparently began 
later and was not fully developed in Grade III. It seems possible that 
recognizing consonant letter sounds may be somewhat easier than 
giving them. The most imperfect ability of all was recognizing the 
short vowel sound under the conditions of the test. - 

Table V shows the average scores in four reading tests given in 
1935-1936. The kindergarten and first-grade groups were tested in 
the Spring by the Gates Primary Reading Tests. The table indicates 
low scores for both the kindergarten groups in word recognition, but a 
somewhat larger score for the five-year group than for the four-year 
children. The first grades made, of course, considerably higher 
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scores in this test, and also showed marked abilities in the other types 
of reading covered by the two other tests of the battery. 

Increase in abilities from Grade II to Grade III is shown by the 
three tests given each of those groups. The scores for the oral reading 
test are in terms of errors, which accounts for the decrease in the 
figures for that test from Grade II to Grade III. 


TaBLE V.—AVERAGES FoR READING Tests By Groups aT Eacn Grape LEVEL, 

















1935-1936 
; St. Ach.,? = 
1 
G.P.R.,! Spring Jedintey Prog.,* Spring | G.R.D.,‘ Fall 
Num- Typ 
Grade | Group | ber of ° . _ 
a bas ‘Is 1. Type Type Vo- Com Vo- Com Oral Word 
pup 2. 3. pre- pre- 
Word cabu- cabu- reading | recog- 
Sen- | Para- hen- hen- mts 
recog- lary g lary , errors | nition 
rae tences| graphs sion sion 
nition 
4 year Kin- Cc 21 3.5 
dergarten 
5 year Kin- B 48 8.2 
dergarten 
I A 23 23.3 | 94.31 38.4 
I F 26 26.5 | 23.5 | 15.4 
a G 10 22.8 | 19.6 | 13.0 
II D 28 edie. © hecen. Eee RO ee Se oe 42.5 25.2 
II E 28 ideo | seve. | eae'd SUE Tee | ae 2 ae 34.2 37.4 
Ill H 29 datas nace coos | 41.3 | 34.4 | 55.3 | 15.6 17.5 51.9 
Ill I 26 pion A wies 2. cece Cee Lee 2 ee be 4.7 61.3 
































1 Gates Primary Reading Tests. 
2 Stanford Achievement Tests. 

3 Progressive Achievement Tests. 
‘Gates Reading Diagnosis Tests. 


2. Correlations.—Table 6 gives the averages of the rho coefficients 
of correlation by tests and grade groups for the data of the three years. 
Column (1) indicates the grades, column (2) the various groups in each 
grade by letter designations, column (3) the number of children in 
each group, columns (4), (6) and (8) the averages of the coefficients 
of each symbol test and the reading tests, for each group of pupils, 
columns (5) and (7) the number of coefficients used to get these 
averages, and columns (9), (10), (11) and (12) the averages of these 
average coefficients, by grades. As will be seen by columns (5) and 
(7) most of the group averages were derived from several coefficients. 

Objection may be made to averaging averages to secure the grade 
average figures. This method was used because it was believed that 
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TaBLe VI.—Grovup anp GRADE AVERAGES OF Four HunpReEpD Srxry-srx Cozrri- 
CIENTS OF CORRELATION BETWEEN READING TESTs AND SyMBois Tzsts 


























































































































yoo 1934-1935 1935-1936 Grade averages 
ram} ver-| amber] ver-|MUmE| aver. ite 
Grade Group | ber of | age of king | 98° of aid age of I | I | il 
pupils} rhos peneensienes rhos rhos ad 
averages averages ten 
(1) (2) (3) (4) (5) (6) (7) (8) (9) | (10)! (11)! (12) 
(a) Phonic Combinations and Reading 
I A 25 .701 
I F 31, 26 4 .665 2 .540 
I G 27,10 3 .743 2 .825 .695 
II D 30, 28 28 . 863 6 .630 
II E 25, 28 11 .604 6 747 .711 
Ill H {29 5 .690 
III I 26 5 .510 .600 
3 7 1 46 4 26 6 
(b) Giving Letter Sounds 
5 years Kinder- B 54, 48 3 .650 .650 
garten 
I A 2a, 23] .607 sid sont 2 .585 
I FP 31. 26 4 .458 2 .750 
I G 27,10 3 .637 2 . 765 634 
II D 30, 28 28 .524 6 .417 
II E 25, 28 10 .106 6 .533 .895 
III H 29 5 .456 
III I 26 5 . 268 .362 
4 -: iw 1 48 5 25 7 
(ec) Naming Small Letters 
4 years Kinder- Cc 21 1 .720 
garten 
5 years Kinder- B 54, 48 5 .702 1 .430 |.617 
garten 
I A 25, 23] .594 ake 2 . 555 
I F 31, 26 4 .400 2 .705 
I G 27, 10 3 .820 2 .705 .630 
II D_ {30,28 10 .740 6 .462 
II E 25, 28 10 . 240 6 .467 .478 
III H 29 5 . 552 
III I 26 5 . 266 .389 
5 9 1 32 5 28 9 
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| TaBLE VI.—Continued 
i rae 
} woes | 1984-1935 1935-1936 Grade averages 
) Num- | Aver- pyre Aver- poe emg Aver- od 
Grade Group | ber of | age of : age of ri age of I | It jl 
s making making gar- 
pupils} rhos rhos rhos 
averages averages ten 
(1) (2) ¢ (3) (4) (5) (6) (7) (8) (9) | (10)| (11)| (12) 
(d) Writing Small Letters 7 
4 years Kinder- Cc 21 1 .640 -% 
garten 
5 years Kinder- B 48, 54 3 .473 1 350 |.488 5 
I A 23 2 .875 
i I F 26 2 .435 
5 1 G 10 2 .625 |....|.478 
lI D {28 6 .570 
II E 28 € .442 . 506 
{II H {29 5 .574 
Ill I 26 5 . 2 ne aa eee 
: 5 ead india 3 30 9 7 
(e) Naming Capital Letters 
4 years Kinder- Cc SEE cece 1 .520 1 .730 y 
garten 
5 years Kinder- B 54,48] .... 5 - 560 1 .390 |.550 ! 
garten 
I A 25, 23] .467 x — 2 .640 
I F = es 4 .478 2 .705 
I G 27,10 3 .727 2 -755 ....|.625 
II D {30,28} .... 5 .620 6 .557 
II Ez BBE coc 10 .521 6 577 |....|....|.569 
III H {29 be al 5 .578 
Ill I 26 5 506 . 542 
5 Oe, Biséead 1 28 6 30 9 
(f) Writing Capital Letters Pe 
§ 
4 years Kinder- Cc | ae 1 510 1 . 560 . 
garten 
5 years Kinder- B {54,48} .... 3 .617 1 .460 |.538 C 
garten ] 
I A 25, 23] .583 2 .520 
I F 26 2 .560 
I G 10 2 -815 j....|.619 f 
II D {28 6 .483 
II E 28 6 .56388 |....|....|.510 is 
III H {29 5 .426 ( 
Ill I 26 5 de Evceciaseckstceleee 
{ 
5 So Mbdewaa 1 4 2 30 9 ( 
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TaBLE VI.—Continued i 
‘oa, | 1984-1935 1935-1936 Grade averages Hh: eee 
1 
'; 
Num-} Aver- pe ecm Aver- pen Aver- peg : 
Grade Group | ber of | age of nishing age of ashing age of pond I | I | 1 
pupils} rhos rhos rhos 
averages averages ten if 
4 
(1) (2) (3) (4) (5) (6) (7) (8) (9) | (10)! (11)| (12) i 
(9) Blending Sounds to Make Words a 
5 years Kinder-| B [54,48 3 | .583 .633 oh 
garten y 
I A 25 .340 i 
I F |31 Aes 4 | .835 
I @ |27 3 | .567 414 ) 
II D {30 " 10 .484 
II E {25 : 4 -175 .330 ; 
3 i:  Baevewien 1 24 5 
(h) Recognising Letter Sounds Heard Cons” | Vowels 
4 years Kinder- Cc 20, 21 1 .500 | 1 .460 
garten 
5 years Kinder- B 54 3 + , Wer en Sere .451 
garten 
I A 25, 23] .381 = eoee | 2 .880 | 2 .080 
I F 31, 26 ; 4 -623 | 2 .595 | 2 .340 
I @ 27, 10 3 .693 | 2 .815 | 2 .575 .492 
II D_ {30, 28 30 .567 | 6 .417 | 6 .298 
II E {25,28 4 -268 | 6 .667 | 6 .442 .443 
III H {29 5 .430 | 5 .085 
III I 26 5 .114 | 5 .224 }. -213 
5 S “Riewean 1 45 6 129 8 (28 7 






































such figures would be more indicative of trends that were present than 
would the method of putting the results from two or more rooms into 
one correlational computation, since teachers used somewhat different 
procedures in their several rooms. 

The table indicates that the highest and most consistent relation- 
ship was between reading and phonic combinations. The smallest 
grade average for this set of correlations was .600, for Grade III. One 


Se pe ne 


of the third grades averaged .690, the other .510. Grade II produced 
an average correlation of .711, and Grade I, .695. While these coeffi- 
cients may not be considered high, statistically speaking, they are at 
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the same time much higher than coefficients ordinarily found in rela- 
tionships which practically are considered to be significant. This 
relationship between phonics and reading was found to be conspicuous 
in the first year’s data. In fact, among over two thousand correlations 
obtained that year, those between reading test scores and phonics were 
outstanding—much higher than those between reading and mental 
age, or intelligence quotient, chronological age, vocabulary, informa- 
tion, or any other of the one hundred six measures obtained. 

The relationship between reading and the ability to give letter 
sounds showed quite high coefficients for kindergarten and Grade I, 
the grade averages being .650 and .634, respectively. The averages 
for Grades II and III, however, were much lower—.395 and .362. 
There were so many coefficients concerned in these averages that it 
would seem the differences might truly indicate grade trends. To 
have ability to give letter sounds seemed more important in the 
beginning stages of reading than in Grades II and ITI. 

The ability to name small letters (and presumably to recognize 
them) seemed. also to have special importance in the beginning stages. 
The table shows that the average coefficient for the kindergartens was 
.617 and for Grade I, .630. The relationships fell considerably lower 
in Grades II and III; namely, to .478 and .389, respectively. Ability 
to write small letters did not give as high average correlations with 
reading as ability to name them did. Ability to name and write 
capital letters gave relatively fairly high average correlations with 
reading, especially in Grade I. 

Ability to blend sounds into words and ability to recognize sounded 
letters gave lower average correlations. It would seem that these 
abilities might be very specialized ones, dependent upon instruction in 
them, which was not given to any great extent in the grades studied. 
It would have been interesting to see whether the third-grade children 
might have given a higher average correlation between reading and 
blending sounds than did the younger children, but since no important 
relationship had been indicated by other groups, time was not taken 
from other work to give the third grades this individual test. 

It is possible that other letter abilities might, in other groups, have 
shown closer relationships to reading ability than was found in this 
study, but the ones discussed were the only ones found to indicate 
significant relationships to reading in the groups studied. 

The significant way in which the relationships between phonics and 
reading, and between naming the small letters and reading were 
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| NT ake 
Taste VII.'—SeparatTe COEFFICIENTS BETWEEN Reapinc Tests anp Two Bs) ce ie 
LetrTer Tests, 1934-1935 Puriis a BY ete 
Gates Primary Reading Stanford Achievement ie t 4 tf 
Num-/} Num-| Aver- Tests, October Tests, January 2 i. 
Grade | Group) ber of | ber of | age of hast Peay. 
pupils|} rhos rhos Sen- Para- . , Ait 
Type 1| Type 2| Type 3| Word eencss | graphs Total ) : " 
(a) Phonic Combinations i " r. 
: b; 
I F 31 4 | .665 “th dt 
Te | @ Tt tte ie 
II D?* 30 28 .863 |.87 .89).89 .88|.89 .85|.90 .87|.90 .84|.88 .79|.93 .90 ,  iahy 
“See? 2a ar + ste Pee see 65 |.63 |.67 |.69 REE 
: Ae 
(b) Naming Small Letters yh 3 
5 years| B 52 5 | .702 |.71 fy 
Kinder- Rae aie 
garten fo 
I F 31 4 .400 f ‘ 
I G 29 3 . 820 
II D 30 10 .740 |.75 .74 .69 .80 -70 .69 .78 
II E 26 10 | .240 |... .27 as 21 .20 .20 Pent 
: | 
: . Progressive : 
Gates one A a Achievement Gates D. ' : 
, v Tests, May Teacher j 
Grade | Group Rank- PO MG 
Vo- Com- ones 
Type 1 Type 2 Type 3 cabu- | prehen-/ Oral Rdg y ; 
lary sion ‘be 
Fhe e 
ae 
(a) Phonic Combinations tf Ya 
I F .62 .60 .65 71 
I G .77 Hank .68 er a oe een .78 ¢: 
II D .84 .87 | .79 .79 | .78 .75 | .87 .80)| .92 .90 .92 .92 .86 .88 ; 
II E .63 .61 .54 .53 .53 are 64 
(6) Naming Small Letters Sat 
- 
5 years Bs . PF eee ie —- ae Swuxiodad .60 .62 ‘ f 
Kinder- * 
garten 
I F 34 .38 .36 .52 
I G .80 .80 <i mae .86 ’ 
II D Fann ae — air oy .79 .77 .69 
II E .18 .26 .50 .12 .18 .28 





























1 Table VII is an extract of the detailed table from which Table VI was made. 

2Group D was given two tests in phonic combinations, and each was correlated with each 
reading test. 

* Group B was given two tests in phonic combinations, both of which were correlated with the 
Spring measures of reading. 
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present, is brought out more clearly in Table VII, which is part of the 
detailed table from which Table VI was made. Particular attention 
is called to the high coefficients shown in Table VII between phonics and 
reading forthe D group. This was a second-grade class which included 
twenty-one of the twenty-five children studied intensively in Grade I 
the previous year, and was taught by the same teacher, who thus had 
the children for two years. This teacher used what she called ‘‘func- 
tional” learning of the sounds and recognition of letters. She did not 
drill on either the alphabet or phonics, but helped her pupils day by day 
to use both sounds and forms of letters as means in reading. The 
table shows that the twenty-eight coefficients made up an average 
correlation between phonic ability and reading of .863, and ranged from 
.75 to .93—consistently high coefficients with fifteen different reading 
measures. 

In the same table it may be seen that the coefficients of correlation 
for naming small letters and reading were relatively high for the 
kindergarten, the G group in Grade I, and the D group in Grade II, 
but quite low for the F group of Grade I, and very low for the E group 
of Grade II. Thesame general facts were true for giving letter sounds. 
In explanation it may be said that the teacher of the E group gave 
special and somewhat intensive attention to phonic drills, rather than 
to a “‘functional”’ use of abilities with letter sounds and forms. Thus, 
perhaps, less relationship of these abilities to reading was developed. 
On the other hand, the kindergarten B group teachers were particularly 
interested in helping their children whenever occasion arose to use 
letter sounds and forms functionally. 


IV. CONCLUSIONS 


Before attempting to draw conclusions it is necessary to indicate 
the sort of school experiences provided the children studied in the 
Horace Mann School. In a broad sense, the experiences were child 
life activities rather than formal ones. Children’s interests, needs, and 
abilities, in an environment rich in variety and character of experi- 
ences, were dominant elements of the school work. Reading as a 
school subject was not forced upon children, although in Grades II and 
III effort was made to induce all children to try to master the 
mechanics of reading. In a way, then, progress in reading for these 
children might be said to have been natural. Their progress thus not 
only illustrated readiness for various steps in mastering the mechanics 
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of reading, but also showed what, in a relatively natural way of 
progressing in reading, the steps themselves were. 

From learning of this sort the following conclusions seem warranted: 

(1) Beginning stages in mastering the mechanics of reading seemed 
to be concerned mainly with the form, names, and sounds of letters. 
Progress began in these respects before Grade I: for many in the four- 
year kindergarten; and for some individuals even earlier. 

(2) Ability to handle phonic combinations became especially and 
increasingly important in Grade I and continued so through Grades 
II and III. 

(3) Abilities to name and write capital letters, and to write small 
letters seemed also significant in kindergarten and Grades I and II, 
but somewhat less so than phonic abilities or learning names and 
sounds of letters. 

(4) Abilities to blend sounds into words, or to recognize letter 
sounds heard seemed of less importance to kindergarten and Grades I 
to III than the abilities named in points (1) to (3) above. 

(5) The functional learning of these abilities seems to have been 
closely related to reading progress. 
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THE MEASUREMENT OF INDIVIDUAL DIFFERENCES 
IN GENERAL ENGLISH VOCABULARIES 


ROBERT H. SEASHORE 
Northwestern University 


AND 


LOIS D. ECKERSON 
University of Southern California 


I. BACKGROUND OF THE PROBLEM 


In attempting to answer the apparently simple question, ‘‘How 
many words do you know,” numerous investigations have discovered 
that individual vocabularies, from childhood to maturity, are much 
larger than we have previously thought. It is not at all uncommon 
for an educated adult to estimate his own vocabulary at a figure which 
is only one to ten per cent of the results obtained from accurate 
measurement. But in spite of the general agreement on the relatively 
large size of measured vocabularies, there is an almost equally wide 
variation in the absolute size of vocabularies studied, even for com- 
parable groups. This suggests the need of a careful analysis of 
definitions and methods to determine the sources of such variations, 
and to set up a more reliable method for such measurements. 

But beyond the interest in the mere size of vocabularies and their 
measurement there are many significant relations of vocabulary to 
problems of psychology, education, and literature. In a figurative 
way it might even be said that words, or the concepts for which they 
stand, are the coin of the realm of knowledge. These units of knowl- 
edge may be accurately defined, classified, and their numbers counted 
so that they give an inventory of the raw materials among our intel- 
lectual resources. When supplemented by measures of various skills, 
we may then estimate a person’s effectiveness in dealing with various 
problems, both theoretical and practical. 

Some of these relationships between vocabulary and other factors 
have already been measured, but the real significance of these and 
similar problems can best be indicated by a careful analysis of the 
background and methods of the experiments themselves. 


A. Analysis of Major Variables 


1. Definition and Classification of Words as Units of Measurement. 
For the purposes of our study we may begin by defining the unit of 
14 : 
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measurement, a word, as an item listed in a standard unabridged 
dictionary, Funk and Wagnalls’ New Standard Dictionary of the Eng- 
lish Language, two volume edition of 1937.1° The editors of this dic- 
tionary state that it includes about four hundred fifty thousand entries, 
but this includes about eighty-six thousand five hundred seventy alter- 
native meanings which are simply numbered and listed under a 
single word. If we omit these alternative meanings, in keeping with 
the ordinary definition of a word, there are only three hundred seventy 
thousand two hundred sixty words in this dictionary. Alter- 
native spellings (about twelve thousand six hundred seventy), usually 
bracketed together, are not counted as separate words by the dic- 
tionary or by us. 

The Funk and Wagnalls dictionary was chosen for this study for 
two reasons: (1) It is one of the few unabridged volumes having all 
words in a single alphabetical order, which facilitates the construction 
of representative short sample lists for the purposes of testing; and 
(2) it is large enough to give full opportunity for nearly all educated 
adults to test all of the words which they were likely to know, but does 
not include the very rare words which would be almost unknown except 
to the few specialists in languages who might prefer the larger and 
more authoritative Oxford dictionary. In order to avoid disputes on 
definitions or usages, persons tested were given credit for knowledge 
of a word if they mentioned any meaning listed by either Webster’s 
or Funk and Wagnalls’. This degree of accuracy was sufficient for 
our purposes. 

As a rough distinction in the importance of words, the Funk and 
Wagnalls dictionary lists certain words, which we shall call ‘“basic,”’ 
in heavier type and next to the margin. “Derivative” terms, includ- 
ing compound terms, are usually listed in medium type and indented 
under the basic term. For example, the basic word “loyal,” has listed 
under it four derivatives, as follows: Loyal Legion (a Civil War order); 
loyalism (n.); loyalize (v.); and loyally (adv.). Since some persons 
may wish to consider the derivatives as only minor variations of the 
basic words, we have studied the two groups separately so that they 
may be considered either separately or together. Philologists have 
shown that most of our modern language words are really derived from 
a relatively small number of root words, but the evolution has extended 
back over so long a time and the modifications are in many cases so 
great that the origin of most words is not recognized by persons 
who are not language specialists. If we are to judge the separateness 
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of words by the criterion of how an individual originally learned them, 
whether independently or simply as variants of some earlier known 
word, the dictionary distinction of listing “basic” and “derivative”’ 
words should give us a meaningful basis for our study, and further 
analyses may be easily extended in later studies. People who are 
skilled in recognizing derivations beyond these rather obvious varia- 
tions have usually learned the process of derivation long after they 
acquired most of their vocabulary as separate words. 

An alternative definition of “a word” for testing purposes is that 
employed by Thorndike*! and others in measuring use vocabularies. 
Thorndike’s rules for classifying words as mere variations of a main 
word include all of the following changes: 


(a) Plurals, s, or changing y to tes. 

(b) Adverbs, ly. 

(c) Comparatives and superlatives, er and est. 

(d) Verb forms s, d, ed, ing. 

(e) Past participles adding n. 

(f) Adjectives, adding n to proper nouns. (Adjectives adding al or ate are 
not mentioned.) 


These main words correspond roughly to our “basic” words, and 
by counting the number of words in the dictionary which would be 
combined with our basic list according to Thorndike’s definition we 
may derive a constant for translating our type of unit into the Thorn- 
dike type of unit. Either method is meaningful, and since both have 
been employed in vocabulary studies, it is necessary to remember their 
relationship. By counting all of the words in the dictionary which 
would be grouped with our basic sample according to Thorndike’s 
definition we find that, in proportion, our estimate of three hundred 
seventy thousand two hundred sixty as the total number of words 
in the dictionary would be equivalent to a much smaller number of 
words according to Thorndike’s definition.* However, following 
commonest usage, we shall hereafter use our own definition of a word 
unless specifically stated otherwise. 


A further analysis of the number and kinds of words found in the’ 


dictionary is given in Table I, which represents the totals from an 





* Such a comparison will be reported in a later study. The problem is com- 
plicated by the fact that there are many other types of variations besides those 
counted by Thorndike’s system. 
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Tasie I.—ANatysis OF Proportions oF Basic AND Derivative Worps FROM 
SamMpLE Paces OF THE FUNK AND WAGNALLS, New STanDARD DICTIONARY 


























“ Basic,” “ Derivatives” 
number of 
marginal 
VE, £, SS Se Se wordsin| Other Com- | Multiple | Variants| Total . of 
heavy parts of pound mean- in deriva- ne apfion 
type speech words ings spelling tives pret 
101 70 17 33 23 5 78 148 
201 34 ll 112 18 2 143 177 
301 58 6 95 24 2 127 183 
401 40 27 91 29 4 151 191 
501 62 11 36 40 5 92 154 
601 55 9 97 60 8 174 229 
701 38 16 32 27 3 78 116 
801 40 22 50 24 2 98 138 
901 49 19 67 39 5 130 179 
1001 89 45 19 28 8 100 189 
1101 107 15 28 30 5 78 185 
1201 53 39 27 35 7 108 161 
1301 47 43 58 18 5 124 171 
1401 67 6 16 45 3 70 137 
Wincnnsesencda 809 286 751 440 64 1551 2358 
Average Vol. I....... 57.7 20.4 53.6 31.4 4.6 111 168 
Average Vol. II...... 62.9 
Average, I+ II..... 60.3 
tens testseves 166, 247 56,243 | 147,775 | 86,570 | 12,672* 466 ,760 
(number of pages in —aor—e—eee ee | minus 12,672* 
dictionary) 204,018 99 , 242 inbieiees 
(total derivative | (not included in our 454,088 
words included in| definition of | Total number items 
our definition of a| word) in dictionary. 
word) 

















* Alternative spellings. 


analysis of the words on a sample of twenty-seven pages equally spaced 
throughout the dictionary. | 

Assuming that these pages are a representative sample of all the 
words in the dictionary, we find that the total number of words accord- 
ing to our definition is divided into approximately one hundred sixty- 
six thousand basic words and two hundred four thousand derivative 
words. 

Table II gives further evidence that our sample is representative 
of the kinds and proportions of various parts of speech in the dictionary. 

2. Methods of Sampling.—The basic theory underlying the meas- 
urement of vocabulary size is that of obtaining a representative sam- 
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pling of the total material. Obviously it is inexpedient to test a 
person on all of the words in the dictionary, but even if we had the 
necessary time and facilities, there would be little point in doing so. 
Scientists and technicians have verified many times the fact that 
measurements of minute samples of very extensive material will give 
results which are almost as accurate as for the total itself, providing 
that the samples can be known to be representative of the kinds and 


TaBLE II.—TuHe PeRcENTAGE OF Parts oF SPEECH oF Basic WorpDs IN SAMPLES 


FROM THE DICTIONARY 
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amounts of materials occurring in the total amount. Instances in 
which we base our decisions on minute samples could be cited from 
any field of science, industry, and every-day life. We judge from a 
sample what the rest is like, and if the sample is carefully taken we are 
usually accurate in our estimate of the whole. 

The dictionary, which includes in a single alphabetical order all 
of the words to be tested, is. very advantageously arranged for the 
construction of such test samples. We wished to obtain a sample of 
several hundred words which would have the same proportion of words 
beginning with each letter of the alphabet as the proportions for the 
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dictionary as a whole. We, therefore, took the third “basic” or mar- 
ginal item in the first column of every left hand page in the dictionary, 
which gave us a total of one thousand three hundred seventy-eight 
words, of which fifty-eight were prefixes, suffixes, or abbreviations and, 
therefore, unusable, leaving only one thousand three hundred twenty 
useful words. Since we wished eventually to construct a number of 
alternative forms of the test, this list was then divided into four 
equivalent subgroups, as described later. Annen,' in a previous study 
with Seashore, had shown that the test results from one such sample 
are reasonably close to those from other samples. Our own basic list 
Number 1 has since been found to give adult individual test scores for 
the odd-numbered items which have an average variation of only four 
per cent from the scores on the even-numbered items, a remarkably 
small variation when we consider that the complete list included only 
4605 of all such words in the dictionary. 

Our method of sampling is technically known as the method of 
work limits, since it sets a standard amount of work to be done and 
measures the number of words correctly defined in unlimited time. 
In this case it acts as a ‘“‘power”’ test since the individual’s score is 
determined by the levels of his word knowledge, but if a relatively 
short time limit is employed, it emphasizes the speed factors, which 
are related to, but not the same as, level of ability. 

In certain studies of children it is possible to record all of the words 
used in a variety of situations over a period of days until no new 
words appear except those learned during the period. This gives a 
measure of the number of words which the individual did use, but it 
does not take into account those which he could have used if neces- 
sary, nor those which he could only recognize. This method is rarely 
applied to adults, whose language is too varied in relation to different 
situations to permit an accurate estimate of the absolute total from 
any short sampling of time. It can, however, be applied to the 
measurement of written vocabularies actually used by authors where 
their entire list of publications can be studied, but even these omit 
many words which the author probably could have used or at least 
understood in situations other than those covered in his writings. 

3. Size of Vocabularies in Certain Typical Situations.—At this point 
it may be well to indicate a few figures on vocabulary size in represen- 
tative situations. Among the English writers Shakespeare, for 
instance, used about fifteen thousand different words (Thorndike 
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type of units) in his published writings, Milton about eleven thousand 
different words, while the French author Victor Hugo used -about 
twenty thousand words. Older writers such as Shakespeare had a 
much smaller English language to draw from, and we know nothing 
about the additional words which they could have used if necessary, 
or at least understood. This distinction as well as the difference in 
units of measurement must be remembered in comparing these 
figures with those from tests on dictionary samples. 

Thorndike*! has determined by actual count the twenty thousand 
words most frequently used in every-day writing and speaking. As 
would be expected, a relatively small list of words accounts for a large 
proportion of all the words we use. 

For example, in Franklin D. Roosevelt’s speech of acceptance of 
the nomination of the presidency of the United States in 1932, seventy 
per cent of the words employed are found in Thorndike’s list of the 
five hundred words most commonly used in every-day writing and 
speaking. Seventy-five per cent of the words fell within the com- 
monest one thousand, and so on, in decreasing increments for suc- 
cessively higher levels. 

C. K. Ogden?* has found it possible to select eight hundred fifty 
general English words (Thorndike units), which together with one 
hundred fifty additional words from any specialized field such as a 
business or science are sufficient to enable a foreigner to communicate 
with anyone who now speaks English. Ogden has simply eliminated 
most of the synonyms and other equivalent words in favor of this 
simpler list, and in the same way has eliminated many of the alter- 
native grammatical forms. The combination of simplified vocabulary 
and grammar is called Basic English,?* and has considerable promise 
as an international language, since it is merely a restricted sample of 
the language which is already the most widely used throughout the 
world. 

For the present, however, and as long as we wish to talk to people 
and to read the works of authors who have not restricted themselves 
to such a basic list, it is necessary to know more than these com- 
monest words. The recognition vocabulary of an educated adult must 
be much larger, and, as our measurements will show, is considerably 
larger than even the vocabulary used by an author such as Shakes- 
peare. In addition there are the specialized vocabularies of each 
technical field, not fully covered by a general dictionary. Measure- 
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ment of these is very important as indicators of aptitude or accom- 
plishment in these special fields. 

4. Criteria of Knowledge—Here we must distinguish between 
several common criteria, all of which are meaningful but have dif- 
ferent significance. 

(a) Criteria of Recognition.—Perhaps the commonest definition of 
the term “‘knowing a word”’ is to be able to recognize or identify its 
meaning. To recognize a word is to refer it to a larger group of words, 
a genus, and to distinguish its meaning from that of other related 
species of that genus. Thus we say that lead belongs to the group 
(genus) of metals and has a certain specific gravity and other physico- 
chemical characteristics which differentiate it from other species of 
metals. 

Since many words have more than one meaning it is necessary to 
test vocabulary in several ways. The most important way appears to 
be a test for knowledge of the commonest meaning, which for our 
purposes was taken to be the dictionary meaning most frequently 
given by preliminary groups of university undergraduates. Further 
research is under way to measure individual differences in the richness 
of meaning, as shown by knowledge of alternative meanings, and dic- 
tion, or choice of words which are nearly equivalent but have slightly 
different connotations. These two separate types of tests may then 
be compared with that for knowledge of commonest meanings to 
determine the interrelations of these abilities. 

In constructing a test for the recognition of the commonest meaning 
of a word we have chosen the multiple-response method with four 
choices. The person being tested selects a synonym or equivalent 
phrase as the correct answer, distinguishing it from several alter- 
native species of the same genus (the next larger classification of words 
as given in Roget’s Thesaurus or a similar work in technical fields) and 
also from other words which might be confused with the word to be 
defined because of similarities in sound or spelling. Among profes- 
sional linguists it might be worth while to insist upon very fine dis- 
criminations in alternative meanings, but for our purposes'in a general 
test it is sufficient to discriminate the commonest meaning from those 
which are definitely incorrect but which might be sources of confusion 
to non-specialists in the field. 

1’. Effect of Context upon Recognition in Vocabulary Tests.—In 
measuring recognition of a word we may choose to test the subject’s 
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knowledge of separate words, or of words appearing in a context. The 
ability to recognize a word should theoretically be aided by having it 
appear in a context such as a sentence. The only quantitative 
evidence so far reported in this field is that of Annen.! She found that 
in the Inglis multiple-response vocabulary test,!? eliminating the 
illustrative sentence, which normally appears before the word to be 
defined, gave no smaller scores than on a subsequent administration 
of the regular test with the context left in. It is still possible that 
other types or amounts of context would make some difference, but 
at least it is quite striking to find that it made none in this case. 

2’. Correction for Guessing in Multiple-response Tests.—Annen 
similarly found that on equivalent samples of dictionary words, the 
average scores for various criteria of knowledge (merely checking a 
word as known, defining the word in their own terms, giving a sentence 
to illustrate its meaning) all agreed within eight per cent of the total 
as estimated from a multiple-response recognition test after correction 
for guessing. This correction assumes that on a four choice multiple- 
response test a subject knows certain words so definitely as to require 
no guessing. On the remainder of the words he would have one 
chance in four if he simply guessed without any knowledge at all. 
The actual score (S) which he would make on this type of test would 
then be made up of the number of words definitely known (K), plus 
one-quarter of the remaining words (#) in the test. This formula is 
then written S= K+ 4R. From this we may deduce that the 
actual number of errors (£) is equal to only three-fourths of the words 
on which it was necessary to guess, and the formula may then be 
written K = total words in test —4gE. This assumes that it is 
proper to count several instances of partial knowledge as equivalent 
to one instance of complete knowledge, perhaps a debatable point, 
but as shown by the fact that these corrected recognition scores were 
not more than eight per cent higher than use scores involving no 
guessing, the error, if any, is a minor one. | 

The findings on the relative unimportance of context and the lack 
of any great difference in difficulty of use or recognition criteria sug- 
gest that knowledge of a word is ordinarily ‘‘all or none,” that is, if 
you can recognize it at all you can also use it. This might be explained 
by the fact that the range of difficulty among words is very great. 
The majority of simpler words have been both recognized and used 
many times. The most difficult words have neither been recognized 
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nor used, and so there remains only a limited number of words for 
which we have partial knowledge. 

(b) Criteria of Use.—A second criterion of knowledge is that of 
use, either by defining a word in your own terms or by a synonym, or 
by using it in a sentence to illustrate its meaning. A test of this type 
is easy to set up, simply giving the directions and the standards to be 
used in scoring. In our case the standards were approximately the 
same (genus-species) as for the recognition tests. This type of count, 
based upon a dictionary sampling, should be called a test of ‘‘ possible 
use,” to distinguish it from that of actual use as found for instance in 
studies of author’s vocabularies in published works. A further dis- 
tinction in the criterion of actual use would be that of use in writing or 
use in speaking, since we probably are less formal and apt to restrict 
ourselves to simpler words in our speaking vocabularies. 

A modified use test which is practicable along with a recognition 
test is to have the subject mark a W before each item which he has 
actually used in some previous writing, and an S for each item pre- 
viously used in speaking. The total of the words marked both W 
and S, plus those marked W or S alone gives a sample for estimating 
the total vocabulary of actual use. In the same way the subject 
may mark a P to indicate that he could possibly use a given word if 
necessary. In estimates of both actual and possible use, credit 
should be given only for those words in which the multiple-response 
recognition answer was also correct. 

Annen,! using shorter lists than ours, reports moderate positive 
correlations, about +.40 between tests employing the four following 
criteria: Checking a word as known, written definitions, use in illus- 
trative sentences, and multiple-response recognition. As mentioned 
previously, the group average for student vocabularies on these various 
tests agreed within eight per cent. This study is being repeated by 
Poindexter with our present list of words. 

In addition to the definition of a word and the criterion of knowl- 
edge, a measure of vocabulary must also define the standard of 
accuracy to be demanded in giving credit for knowing a word. Per- 
haps the best general criterion is to be able to give or recognize a 
synonym or an equivalent term, and correctly to distinguish this 
from similar words (species) in the same larger group (genus) and also 
from words having a fallacious resemblance in spelling or sound. The 
correct use of a word in an illustrative sentence would involve a similar 
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type of discrimination and could be judged by the same general 
criteria. 

3’. Control of Incidental Cues in Recognition Tests—In the case of 
recognition tests our preliminary experiments in the construction of 
multiple-response tests indicated that the alternative responses must 
be selected very carefully according to a number of principles in order 
to avoid either an excessive degree of difficulty, through too fine dis- 
tinctions in meaning, or the occurrence of “‘give away” items which 
would allow credit for incidental cues other than legitimate knowledge 
of word derivations. Our set of rules was obtained from the leads 
given by introspective reports of a few well-educated adult subjects 
who described the manner in which they selected the probably correct 
answer on each item of the test. After revision of the faulty items in 
line with these findings, the tests were then given to larger groups of 
college undergraduates to make sure that each item was actually 
functioning, that is, receiving some votes, and that the subjects in 
the highest quartile of the group tended to make fewer wrong answers 
on each item than did the members of the lowest quartile (the quartiles 
being selected on the basis of scores from the test as a whole). The 
easiest items have not yet received this careful analysis because they 
are too easy for our subjects, being known by practically all college 
groups, and our youngest subjects in this part of the study included 
only ages twelve to eighteen years. 

The rules adopted for standardizing the difficulty of the discrimi- 
nations to be called for in our recognition test eventually worked out 
as follows: 


RULES FOR SELECTION OF MULTIPLE CHOICE ANSWERS 


(1) Choose words on parallel levels under the next larger class group, as 
given by Roget’s Thesaurus or a similar classification. 

(2) Include words similar in sound or root. 

(3) Balance the length of words or phrases as between correct and incor- 
rect choices, both on the same item and throughout the test. 

(4) Keep the same part of speech for all four items. 

(5) Avoid the use of “give-away” answers such as opposites or similarities 
in sound or root. 

(6) Keep the level of difficulty of the answers lower than that of the word 
which is to be tested (judging by Thorndike’s Teacher’s Word Book of twenty 
thousand words where possible). 
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Keeping in mind these various definitions of a word, the criteria of 
knowledge, and the standards of accuracy called for, we may review 
in Table III the results of previous experiments on the measurement of 
vocabulary size. Note especially that in these studies the size of the 
estimated vocabularies is roughly proportional to the size of the 
dictionary from which the sampling was taken, and that none of these 
investigators has used a dictionary as large as the Funk and Wagnalls 
unabridged edition of four hundred fifty thousand words employed 
in the present study. Our own present choice of that size was due in 
part to the information from an unpublished study by Hartmann, who 
showed that such a dictionary is necessary to provide adequate oppor- 
tunity for measuring the vocabularies of educated adults. 

5. Interrelations with Other Abilities —Terman,* in his construc- 
tion of the old Stanford-Binet intelligence test discovered that the 
section comprising a fifty-word vocabulary test was more predictive 
of the IQ to be obtained from the whole test than any other three 
parts of the test in combination. In keeping with this finding Sea- 
shore, Stockford and Swartz?* found the following correlations between 
a preliminary form of our present test (Form A, which was one-third 
longer) and other educational tests: 


r vocabulary vs. three speed of reading tests .02 to .04....... N = 83 
r vocabulary vs. Otis Advanced Intelligence Test Form B .49 
r vocabulary vs. A.C.E. Intelligence Test .57............... N = 43 


r vocabulary vs. reading comprehension test on psychology .60 N = 100 (approx.) 


Evidently, then, vocabulary is closely related to or an essential 
part of the complex abilities known as general intelligence and reading 
comprehension, while speed of reading is largely an independent 
variable, at least within the limited range of college students. 


II. OBJECTS OF THE PRESENT EXPERIMENTS 


(A) To construct a set of tests for estimating the total size of 
general English vocabularies in terms of dictionary entries 
as previously defined. These tests include: 

(1) A multiple-response recognition test for all commoner 
basic words, the most important group. 

(2) A supplementary use test for knowledge of rare basic 
words where only a few are likely to be known. 
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Taste III.—Previous Estimates or Tor 
































































































































































Grammar school 
Author Date | Refer- Basis of estimate Kind of test 
ence 
2 3 4 5 
Dean Farrar 7, 13, 21, | “Listened for a long time | Usespoken i... .| .. 0. fee e cee feces 
24 together to the conversa- 
¥ tion of three peasants who 
were ga es 
u- 6 sve 
ai" D’Orsey, A. D. SEP. VE sess enceobcudaciniad ee. Be ad heleee 
Baird, W. R. 1891 |3 Total words needed in a| Use-spoken |.....|) .....}......].....5.. 
foreign country to get 
” aa along 
Marsh, G. P. 1872 | 7, 20 Counts of authors’ writings} Use-written |[.....| .....]......]........ 
Holden, E. 8. 1878 | 14 Counts of authors’ writings} Use-written 
and con ces 
Kirkpatrick, E.A. | 1891 | 18,19 | Webster’s Academic Dic- | Chec recog- |4,480) 6,6.) ,v20) 7,860 
1907 tionary 28,000 words nition 
Doran, E. W. 1902 | 7 International Dictionary— 0 SE SS Ee aareeeeee 
Abridged we nition 
Babbitt, E. H. 1907 | 2 Ra, ae er ee EE 
100,000 words 
Bonser, F. G. 1915 | 4 Webster’s Academic Dic- | Checking recog-|.....} .....}...... 10,886 
ay? Burch, L. H. tionary 28,000 words nition Group A 
Turner, M. R. Webster's Elementary Group B 
School Dictionary; 44,000 
th words 
Terman, L. 1916 | 30 Laird and Lee’s Webster's | Oral—definition |..... 3,600)...... 5,400 
: vest-pocket dictionary; | Use MA 8 MA 10 
: “ 18,000 words 
ai Gerlach, F. M. 1917 | 11 Funk and Wagnall’s New | 344-choicerecog-|.....| -....|-...2- Jee... 
'- Standard ictionary; | nition 34 defini- 
} 250,000 words tion 
4 Use 
fae Brandenburg, C.C. | 1918 | 5 Webster's Academic Dic- | Definition Use |4,000] 5,429| 6,887) 8,207 
i | tionary 28,000 words 
Neher, H. L. 1918 | 22 Laird and Lee’s Webster's | Definition Use |.....] .....|..-...J....05.- 
vest-pocket dictionary; 
18,000 words 
Holley, C. E. OS rere Pee eee. eer me 3,144) 5,220) 6,030 
Symonds, P. M. 1926 | 29 E. L. Thorndike’s Test of | 5-choice recogni-|.....| ..... 4,000} 6,000 
1921 based on his Word| tion— 
Book Completion Use 5,000} 6,900 
Gillette, J. M. 1927 | 12 Used Terman’s test yecking recog-|.....| ..---|....--]--...--. 
Dicti of 209,000 The 
0! ’ 
— nition 
Cuff, N. B. 1930 | 6 Webster's Shorter School | 4-choice recogni-|..... 7,425|10,395| 12,460 
Dictionary 35,000 words | tion 
} Seashore, R. H. 1933 | 1, 27 Funk and Wagnall’s Dic- | 4-choice recogni-|.....] .....|......[e.ee-.0- 
tionary 145,000 words— | tion 
* derivatives in unabridged 








* The number given in the “Reference” column refers to the bibliography at the end of this paper. 
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(3) Alternative use or recognition criteria for a small sample of 
derivative words which are of less importance and for 
which this is only a preliminary study. 

(B) To determine the average size, range, variability, and norms 
for such measurements among college undergraduate 
students. 

(C) To determine the accuracy of such samplings and measure- 
ments as shown by the percentage of variation between odd- 
and even-numbered items of our main test, and by the 
correlation between odd and even halves of the test. 

(D) To obtain preliminary data on the size of vocabularies of use, 
including actual use, written and spoken, and possible use. 

(Z) To investigate further the interrelations of the size of recog- 
nition vocabularies with other intellectual abilities such as 
reading comprehension, reading speed, and general intelli- 
gence, and the attainment of college grades. 


III, PROCEDURE IN THE PRESENT EXPERIMENTS 


In setting up the tests for measurement of individual differences 
in knowledge of the commonest meaning of words the following steps 
were employed: 

A. Construction of a sample list of words from Funk and Wagnalls’ 
unabridged New Standard Dictionary. 

(1) Master sample: Every third basic (heavy type, marginal) 
item down from the top of the first column on the left-hand pages of 
the dictionary was taken, giving a total of one thousand three hundred 
seventy-eight, of which fifty-eight were prefixes, suffixes, abbrevi- 
ations, and other unusable entries, leaving a list of one thousand 
three hundred twenty useful words. 

(2) Subdivision into test samples: The master sample of one 
thousand three hundred seventy-eight words entries was then divided 
into four shorter equivalent forms by placing the first word (alphabeti- 
cally) in Form 1, the second word in Form 2, and so on in rotation. 
Since there were slight variations in the numbers of prefixes, etc. which 
were eliminated, the four forms finally totalled three hundred thirty- 
one, three hundred thirty-one, three hundred twenty-nine, and three 
hundred twenty-nine words, respectively. This slight variation in 
actual length of the four samples is of no importance because we need 
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to know only the proportion of the sample which can be correctly 
defined, in relation to the size of the entire dictionary. 

In order to arrange the words according to difficulty we next gave 
the four lists to groups of undergraduate students to define in their 
own terms or to use in illustrative sentences. All words which were 
correctly defined or used by at least one person (using a liberal scoring) 
were included in a list of common basic words, with which we are 
principally concerned henceforth. The remaining unknown words, 
together with all proper nouns, obsolete, archaic, provincial, and other 
words similarly designated by the dictionary were included in a sup- 
plementary list of rare basic words for each form. 

(3) Preparation of Form 1* for use in testing: 

(a’) The one hundred seventy-eight common basic words of Form 
1f were next put into the form of a four choice multiple-response test, 
(Part 1), the alternative answers being chosen in accordance with our 
preliminary findings on the way in which such items function (ef. 
p. 24). Three successive revisions of the test were tried out in 
mimeograph form and each was revised in accordance the results of 
the item analyses as previously described. 

(b’) The one hundred fifty-eight rare basic words from Form 1 
were listed separately as Part 2 of the printed test, and the following 
directions given for use criteria: ‘‘Supplementary List of Proper Names, 
Geographical Locations and Rare Words. Define, identify, or use 
in illustrative sentences, being sure to give all meanings known, and 
compare with usage given in manual, from F. and W. unabridged dic- 
tionary.’’{ Although the use criterion is slightly more difficult than 
that of recognition, the average number of correct answers (about 
three for college students) was so small as to make the error almost 
negligible in counting this as an addition to the recognition scores on 
Part 1. In this test the observers were only credited if they gave the 
particular meaning listed by the dictionary for that particular item, 





* Published by the authors, Evanston, II. 

t In order to insure maximum reliability we first included one-third of Form 4 
along with Form 1, the composite being known as Form A (the first printed version). 
Wher experiments showed that this form was slightly too long for use in class 
periods, the additional items from Form 4 were dropped and subsequent printed 
editions of the test have included only the words from Form 1. 

t At present the score on this part may be simply estimated from the score 
on the previous list (Part 1) according to manual of directions. 
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even if it had other commoner meanings. The directions took account 
of this by calling for all meanings for all words known. The one 
hundred seventy-eight words of Part 1 and the one hundred fifty- 
eight words of Part 2 together constitute %osth of all such basic 
words in the dictionary. 

(4) Construction of Part 3, test for derivatives: Since it is still an 
open question as to whether or not derivative terms should be con- 
sidered as separate words, we conducted only a preliminary study on 
this list, which included all terms (including compound terms) indented 
and in medium type under our basic words in the dictionary. Since 
the full list of words was found to involve too great a burden upon the 
subjects tested, we further reduced the list by taking only every tenth 
word, beginning with the tenth. This left a sample of forty-six words 
which then constituted Part 3 of our test. This sample represented 
VYasoth of all such derivative words in the dictionary. The direc- 
tions for Part 3 were as follows: ‘‘Derived Terms. (Abbreviated 
list.) For each word or phrase in this list see if you could think of 
(or write, if requested) a specific instance in which it would be cor- 
rectly applied so as to illustrate its meaning. In compound or 
technical terms you should know the main points, (especially those 
underlined in the scoring manual) not merely a general idea of its 
meaning. Mark each item plus or minus, then correct this scoring 
after comparison with the definitions from the manual.” 

The rating criterion was justified by the fact that the authors had 
previously shown that multiple choice scores of basic and derivative 
words in an earlier test were correlated to the extent of r = .60, so 
that if only a moderate degree of accuracy is desired we may simply 
estimate a person’s score on Part 3 from his score on Part 1 (as given 
later in Table V). 

A slightly higher degree of accuracy may be obtained by having 
the subject rate his knowledge for each word in comparison with the 
definition later read to him, or where still greater accuracy is desired, 
he can write out the definitions or illustrations to be scored by the 
experimenter from the list of correct definitions. 

By having our Ohio State University group of two hundred twenty- 
seven undergraduate students mark their status at ten-, twenty-, and 
thirty-minute time limits we have obtained preliminary norms for 
scoring the vocabulary test under conditions emphasizing speed. In 
this scoring the number of words attempted in a given time minus 
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1.33 times the number of errors equals the number of words correctly 
identified without benefit of guessing. 


TaBLe I1V.—Decitze Ranks For TWENTY-MINUTE Time Limit 
No. or Worps Known In Parr 1 
(ArTremptTs—1.33 X Errors) DgciLE 


128-141 10 
122-127 
118-121 
114-117 
111-113 
107-110 
103-106 
98-102 
92- 97 
50- 91 
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IV. RESULTS AND DISCUSSION 


The results of the three parts of the test are summarized in Table 
V, which gives the means, standard deviations, ranges, and decile 
ranks as obtained from various groups of college undergraduates. 
Parts 1 and 2 are based upon tests of two hundred thirty-seven Ohio 
State University students in an undergraduate course in educational 
psychology, and one hundred sixteen Northwestern undergraduate 
students, both groups including principally freshmen and sophomores. 
Part 3 has not yet been adequately standardized, the norms given 
being based upon only one hundred twenty Northwestern freshmen and 
sophomores who employed the self-rating criterion. Another similar 
class of sixty students, including more freshmen, had previously 
employed both the self-rating and the written definition criteria, 
knowing that they would be compared. This group gave practically 
equal results for the two criteria, but the average (78,000) was lower 
than the average for the larger group (100,000) which employed only 
self-ratings. This may have been due in part to two other factors; 
the considerably larger proportion of freshmen in the smaller class, 
and the implication of a more stringent standard of accuracy. Further 
studies are under way to provide more representative norms for this 
part of the test. 

Preliminary work with children in the first eight grades of school 
has indicated the following points: 

(1) All such children can be tested by the criterion of use, either 
definitions or sentence illustrations, on our list of words in Form 1, 
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TaBLE V.—EstTmaTION OF CoMPLETE VOCABULARY FROM Parr I Scorzs 
(UNLIMITED Timp) 








Part I, errors Part I, Part II, Part III 

(not corrected basic basic . ‘ d Total Decile 

for guessing) common rare _— 
21-31 69 , 900 2,525 120,150 192,575 10 
32-34 65 , 200 2,020 113,475 180 ,695 9 
35-37 63 , 200 1,768 106 , 800 171, 768 8 
38-39 61,500 1,768 102 ,350 165 ,618 7 
40-42 59,800 1,515 97 , 900 159,215 6 
43-44 58,100 1,263 93 ,450 152,813 5 
45-47 56 , 500 1,010 87 , 220 144,750 4 
48-50 54,500 758 80,100 135 ,358 3 
51-55 51,800 505 72,535 124,840 2 
56-78 49 ,800 0 62 ,300 112,100 1 




















TaBLeE VI.—Deciize Ranks ror Actua Scores ON Parts II ann III (UNtmTEp 


Tre) Woen Not MeErexty EstimatTep FrRoM Part I 





Raw score (correct) 
Part II 


Raw score (correct) 
Part III 


Decile 
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27-35 
25-26 
24 
23 
22 
21 
19-20 
18 
16-17 
10-15 
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which is arranged in order of difficulty so that the whole list need 
not be employed for younger children. 

(2) In giving Form 1 as a recognition test for children below the 
fourth grade and for very dull children above that grade, the test 
as a whole or at least any difficult portions should be given orally. 
Where children tend to give the last answer or the simplest word 
rather consistently it is probable that this portion of the test is too 
difficult for them, and the examiner may then read off the next list of 
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words to be defined, testing only those on which the subject indicates 
that he knows something of them. In this case the number of words 
actually tried is substituted for the total of one hundred seventy- 
three in the scoring formula for basic words. (Total common basic 
vocabulary of Part 1 = Number of Words attempted minus 1.33 X 
errors, times 505.) Scores on use for rare words in Part 2 are almost 
completely absent at this level except for occasional geographic names 
of especial interest to the group. 

(3) In the case of certain very young or very dull children the 
tendency to be misled by superficial similarities among the wrong 
multiple-choice answers in a recognition test was great enough to 
reduce their corrected score to zero, even though they indicated by use 
criteria that they definitely did know a number of the easier words. 
In order to avoid emphasis upon this negative factor of misinformation 
it is, therefore, recommended that use criteria be employed whenever 
such results are encountered. It seems probable, however, that the 
majority of children above the fourth grade can take the multiple- 
response test by omitting the fifth page of Part 1. Further details for 
younger children must await more intensive investigation, but it is 
entirely reasonable to believe that their vocabularies will be described 
in thousands rather than in hundreds of words. 

In summary of the results for college students, it may be pointed 
out that the average college undergraduate in our groups recognized 
(after correction for guessing, as previously described) thirty-five per 
cent of the common basic words, one per cent of the rare basic words, 
and forty-seven per cent of the derivative words. The proportionate 
figures for a test on the whole dictionary would be 58,186; 1,662, and 
95,888, or a grand total of 155,736 words according to our dictionary 
definition of a word. The range of total scores for college under- 
graduates was 112,100 to 192,575. 

In comparing the scores from odd and even items of Part 1 basic 
words it was found that the average difference in the absolute scores 
of the two halves was only four per cent, and, of course, the full test 
should be even more reliable. For our restricted range of college 
undergraduates the corresponding reliability coefficients have also 
been determined. 

As measures of reliability we may cite the following correlations 
on two hundred twenty-six to two hundred thirty-seven undergraduate 
students in educational psychology. 
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r odd-even, unlimited time, Part 1, Form 1 .71 + .02; corrected* .83 + .02 
r odd-even, twenty minutes, Part 1, Form 1 .87 + .01; corrected .93 + .01 
r odd-even, thirty minutes, Part 1, Form 1 .73 + .02; corrected .84 + .02 


Intercorrelations of various scoring systems on the test are as 
follows: 


r 1st ten minutes vs. unlimited time .45 + .04 
r lst twenty minutes vs. unlimited time .67 + .02 
r lst ten minutes vs. Ist twenty minutes .84 + .01 


All three of the above intercorrelations are, of course, partly 
artificial since the score for longer time includes the score for shorter 
time. These are given only to illustrate the effect of various scoring 
systems, and it is obvious that speed and power are to some extent 
different variables. Either emphasis could be used if desired. Decile 
ranks for twenty-minute time limits are given in Table IV. 

This apparently low reliability is due not only to the restricted 
range of ability among our subjects, but also to the fact that the first 
two pages of the test are too easy for college students, being rarely 
missed except through carelessness by anyone above the first decile. 
The students, therefore, spend a fairly large proportion of their time 
on items which were non-discriminative for this group, and only the 
next two pages were within their range of difficulty, with a few of the 
abler students making a fair showing on page 5 of the test. It would 
involve no serious error for college students to be credited with correct 
answers to all words on pages 1 and 2 without requiring them to take 
these portions of the test. Pages 1 and 2 are necessary for use with 
younger or less highly selected groups, and page 5 is necessary for able 
students even among undergraduates. 

When the previously mentioned small group of sixty under- 
graduates was asked to mark each item on Part 1 as having been 
previously used in writing (W), or speaking (S), or both (WS), and 
also if they could possibly use it (P), the averages for the various types 
of use vocabularies were as shown in the table on page 35. 

To these figures we may add the previous findings on use criteria 
of rare basic words in Part 2 (1,662) and derivative words in Part 3 
(95,888), which together with the above figures for Part 1 gives an 
average use vocabulary of 153,202, almost as large as the average 
recognition vocabulary (155,036). 





* Corrected by Brown Spearman prophecy formula for full length of test. 
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For fifty Northwestern freshmen the vocabulary score in unlimited 
time correlated r = .61 with the reading comprehension section of the 
Ohio State University entrance examination when given in twenty 
minutes instead of the standard thirty-six-minute time limit. For 
the same students the correlation with the Iowa English Examinations 
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(language usage) also = .61. Both of these values are about the 
same as that found between such tests or general intelligence tests 
and college grades. 

On one hundred twenty Northwestern freshmen and sophomores 
Form 1 vocabulary scores (unlimited time) correlated r — .62 and 
r — .19 with the lowa Reading Comprehension Test, Forms A and B, 
respectively. The correlations of Form 1 with college grades in 
elementary psychology was r = .57 (N = 120). 


V. SUGGESTIONS FOR FURTHER RESEARCH 


Although the field of vocabulary testing is a fairly old one, it 
has not been developed either intensively or extensively. Among 
the promising lines of further investigation which have been opened 
by the present study the following would seem to be immediately 
feasible: 

(1) The standardization of the present test for children of school 
grades I to XII, and comparison with previous results by other 
methods on development of vocabulary at the pre-school age. 

(2) Longitudinal studies in which the vocabulary development of 
the same children is followed over a period of years, or by Laswell’s 
method of interlapping age ranges for shorter periods of time. 

(3) The adaptation of the test for special age or talent ranges by 
determining which words may be omitted for determinations of 
absolute size as being either entirely too easy or too difficult. A 
similar project has already been tried for the selection of the one 
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hundred most diagnostic words in the test for the measurement of 
relative differences only. This test is being tried out in various 
personnel uses, e.g., the prediction of success in acquiring technical 
training through a correspondence school. 

(4) The measurement of ‘‘depth” or richness of meanings, includ- 
ing multiple meanings of words (now under way). This may or may 
not be related to our measurements of single commonest meanings. 
Similarly the ability to discriminate fine differences between series of 
closely related words may or may not be a separate factor. 

(5) The measurement of effectiveness of various types of word 
training, e.g., vocabulary drills, word games, wide reading, extensive 
‘writing of different types, etc. Form II of the test is now being 
prepared for use in such measurements of initial and final status in 
reference to various types of learning. 

(6) Similar vocabulary tests of technical knowledge in specialized 
fields are being studied by Rexroad* and others as measures for 
prediction of scholastic success, and as measures of attainment in 
given courses of study. These may be constructed for any given field 
and are certain to be of interest to teachers. 

(7) In the analysis of individual differences in verbal abilities the 
vocabulary test can be employed either as a straight power test for 
level of ability, or as a speed test. Intercorrelation and factor 
analysis with other verbal tests such as Thurstone’s battery of primary 
abilities would be desirable. 
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AN ANALYSIS OF THE RANGE OF TESTING AND 
SCATTERING AMONG RETARDED CHILDREN ON 
FORM L OF THE REVISED STANFORD BINET 


ARTHUR BERGER AND MORRIS SPEEVACK 


Psychologists, Board of Education, City of New York, Bureau for Children with 
Retarded Mental Development 


INTRODUCTION 


In discussing the desirable range for testing, Terman and Merrill 
state “ ... it is necessary to go back . . . until a level has been 
reached where all the tests are passed. This point is the basal age. 
In like manner the examination should be carried up the scale until an 
age level has been found in which all of the tests are failed. Thus in 
effect the subject is given only that portion of the scale which marks 
off the upper and lower limit of his ability.” 

In the clinical examination of retarded children referred for 
ungraded class placement the writers have noticed marked unevenness 
and scattering among the pupils. Although we recognize that the 
manifestations of intelligence are uneven and that the lengthening of 
the scale would naturally increase the range of testing, we find a large 
percentage of retarded pupils who scatter beyond what is generally 
considered the range of testing. 

Terman has recognized the necessity for a thorough test in the 
examination of defective children. He states: “‘ Defectives are more 
uneven than normal children and therefore scatter their successes and 
failures over a wider range.”? With such subjects it is absolutely 
imperative that the test be thorough. 

How far should the test be carried to be considered thorough? 
Terman has noted that ‘‘It sometimes happens that a subject passes 
all of the tests at a higher age level than the one in which the first 
failure occurs. Again he may pass tests beyond the age group at 
which he has failed all tests. In testing normal children little error will 
result if we go back no farther than the first year in which no failures 
occurred and if we stop with the first year in which there was no 
success. In computing mental age all successes and failures are taken 





Terman, L. M. and Merrill, M. A.: Measuring Intelligence. Boston: Hough- 
ton Mifflin Company, 1937, p. 63. 

*Terman, L. M.: ‘The Measurement of Intelligence.’ Boston: Houghton 
Mifflin Co., 1916, p. 129. 
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account of, including any failures that may have occurred below the basal 
age level and successes beyond the first year at which all tests have been 
fatled.””* 


SOURCE OF DATA 


A random sampling of one hundred and ninety-six cases of children 
in elementary schools who were referred for examination for ungraded 
class placement was taken. The CA range of this group was from 
seven years and one month to fifteen years and ten months. Table I 
indicates the number of subjects according to age. 


TaBLE I.—STANDARD AND CorRECTED IQ RaNcE or OnE HunpRED NINeETY-sIx 
Pupits REFERRED FOR EXAMINATION ACCORDING TO AGE 











Age IQ range | Mean IQ | IQ range | Mean IQ | Number 

(corrected)| (corrected)| (standard) | (standard) | of cases 
OS ee 65-101 79.3 65-101 78.86 7 
SS eae 66-111 80.7 66—107 79.0 23 
9-0 to 9-11........... 56— 94 74.3 56— 90 73.0 31 
10-0 to 10-11.......... 53- 95 74.1 50— 95 72.4 44 
11-0 to 1l-11.......... 58- 89 73.0 56- 89 71.55 27 
12-0 to 12-11.......... 48-— 97 70.8 48- 97 69.3 29 
13-0 to 13-11.......... 53— 88 65.85 52- 88 65.25 20 
14-0 to 14-11.......... 53-— 85 69.0 53— 85 69.0 6 
15-0 to 15-11.......... 55- 81 68.8 53- 81 67.8 0) 
eo eee) Gere) Me Sime OE Me Og! Sa | iy ee ee 196 




















TREATMENT OF DATA 


The practice of the writers in the administration of Form L was to 
carry out the test to at least two years in which there were all failures. 
In many of these cases the two years of failures were consecutive. 
Often after a year level in which all items were failed was reached, the 
pupil would succeed on an item or items beyond this level. The test 
was then continued until another year of complete failures was reached. 

It would have been desirable to follow the same procedure at the 
lower end of the scale to determine to what extent scattering takes 
place at the basal ages. Because of factors such as fatigue and loss of 
rapport, which are discussed elsewhere in this paper, this was not done. 





1 Terman, L M. and Merrill M. A.: Op. cit., p. 63. 
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A comparison was made between the MA obtained if the test was 
not continued beyond the first year in which all items were failed and 
the MA obtained when the test was carried beyond that point. Table 
II shows according to age the range of increase in MA when the test is 
continued beyond the first year of complete failure. For convenience 
we shall refer to the first year of all failures as the first zero point and 
the final year of failure on all test items as the final zero point. 


TasBLe II].—DirFeReENcEs IN Menta AGE OBTAINED WHEN Test Is CARRIED 
BEYOND THE First Zero Point 














All cases Cases showing change in IQ 

Age Range of | Average Range of | Average 

Num-| increase in | increase in | Num-| increase in | increase in 

ber | months of | months of | ber | months of | months of 
MA MA MA MA 
7-0 to 7-11....... 7 0-2 0.7 3 2 2.0 
8-0 to 811....... 23 0-6 1.7 12 2-6 3.2 
9-0 to 9-11....... 31 0-8 1.5 7 2-8 2.7 
10-0 to 10-11...... 44 0-10 2.0 22 2-10 4.0 
11-0 to 11-11...... 27 0-8 2.1 15 2-8 3.9 
12-0 to 12-11...... 29 0-14 1.9 13 2-14 4.2 
13-0 to 13-11...... 20 0-4 0.9 7 2-4 2.6 

14-0 to 14-11...... 6 0 0 0 0 0 
15-0 to 15-11...... 9 0-4 1.3 4 2-4 3.0 
ts io 2 196 83 























Increases in mental age were found in forty-two per cent of the 
cases. The average of these increases was three and two-tenths 
months. Statistically this does not appear to be a large change. 
However, in clinical practice, it appears that the change is great 
enough to warrant the procedure of extending the test beyond the 
first zero point. 

A criticism may be made that this is not standard procedure and 
that the 1Q’s obtained by this method are not comparable to the 1Q’s 
obtained under the usual test procedure. Terman and Merrill 
indicate that ‘‘In the application of the provisional scales to more than 
three thousand subjects upon whom the standardization was based, 
the testing in all cases covered a very comprehensive range. The 
examination of each subject was extended downward through an 
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age level in which there were no failures and upward through an age 
level in which there were no successes, according to the provisional 
standards of scoring.”' While this was the procedure used in the 
standardization of the scale, they indicate, as quoted elsewhere in this 
article, that all successes and failures should be taken account of in 
computing mental age. Since clinical practice deals with individuals 
rather than with measures of central tendency, the range of change of 
IQ appears significant. 

The question as to the optimum level to which the examination 
may be extended bears further investigation. In extending the test it 
is necessary, of course, to increase the number of items given to the 
subject. It also requires the child to respond to items of a higher 
difficulty. The problem of when the fatigue point is reached as well as 
the loss of rapport due to increase in difficulty of items is not answered 
by the present study. 

The writers have touched on the problem by making a tabulation 
of the number of test items given each child. The computation of 
test items is based on the number of items administered from the basal 
year to the last zero point before which there is a success. 3 


Tasie III.—Nvumser or [tems ADMINISTERED TO Eacu CHILD FROM THE BasaL 
YEAR TO THE Last ZERO Point PRECEDED By A Success 





























Number of items 
Age 

12 | 18 | 24 | 30 | 36 | 42 | 44 | 48 | 50 | 54/ 56 | 60 | 62 | 66 
| ITS ee eae ff BF Re wr 2 
SS ) 1 24 3} 4 2. 3} ..| 4 
| ee i we » & | i ee oe 1 
10-0 to 10-11............ 22s 9 8. 5} 1] 8. 1 
SE a rams. 4| . ae 3 
12-0 to 12-11............ tsa aaki@®.itkh 3 
13-0 to 13-11............ 21 6 1) 4 hu} 62 hud 1 
14-0 to 14-11............ <a oa Bw ae ee ee 2; 1 
| ak ae eek wa a Gl él 2} 1] 2 
Total (1906)........... 1} 6) 15) 21) 34) 35) 3) 28) 2) 36 5 3 6) l 






































A study of the number of items it is necessary to administer might 
throw some light on the question of fatigue. 





1 Terman, L. M. and Merrill, M. A.: Op. cit., p. 11. 
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A study of the number of year levels beyond the child’s chronologi- 
cal age to which the test must be carried for completion, according to 
the suggested method, might throw some light on the question of loss 
of rapport. A retarded child of nine, for example, might be expected 
to respond to a few items at his own chronological age level. He might 
even succeed in passing a few items at a year level or two beyond his 
chronological age. However, in extending the test to a point where 
the items are at a difficulty far above his chronological age our clinical 
sense tells us that we are losing the child’s interest. Even dull 
children have enough insight to recognize failure in their responses. 
This fact should be even more outstanding if mental age rather than 
chronological age were used as a basis. A tabulation was made show- 
ing the number of years of test levels above the chronological age of the 
child to which the test had to be carried. 

It was found that somewhat over one fourth of the cases were giver: 
items from three to six years above their chronological age level. 
Discouraging as this must be to the retarded child, a consideration of 
the number of year levels above the mental age to which the test must 
be carried gives a truer picture of the difficulty the child encounters. 
Of the entire group in this study eighty-six per cent were given items 
three years or more above their mental age. 


Taste [V.—Test Irems Passep arrer First Zero Pornt Was REACHED AND 
FREQUENCY WITH WuicH THEsE Items WERE PassEeD 




















Item number 

Year level 
on Form L 1 9 3 | 4 5 6 
‘ 2 2 0 0 1 0 
9 2 0 7 1 7 4 
10 0 16 0 0 3 8 
11 ta 2 1 ta 2 1 
12 0 4 10 7 1 1 
13 12 9 6 12 1 x 
14 0 1 3 2 2 1 
AA 0 1 0 1 1 l 

| | | 














Items 5 or more year levels above the pupil’s MA were administered 
in almost forty per cent of the cases. Although this does not give us 
a definite measure of loss of rapport, it seems logical that when a child 
is confronted with items of such difficulty, he would easily become 
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discouraged. Further investigation into a means of determining the 
loss of rapport would be valuable. 

A study of the actual test items passed by the pupils in this group 
after the first zero point had been reached was made to determine 
which particular item or items the pupils had a tendency to succeed in, 
above what would generally be considered their upper limit. Table IV 
shows the test items passed after the first zero point was reached and 
the frequency with which they passed these items. 

Among the items which show frequent successes beyond the first 
zero point are the drawing of designs from memory, the making of 
change, the picture absurdity (frontier days) and the word memory 
and problems of fact at the thirteen-year level. 


SUMMARY 


The scatter on Form L of the Revised Stanford Binet administered 
to one hundred ninety-six pupils referred for examination because of 
suspected mental retardation was investigated. 

When the test was extended beyond the first-year level in which all 
the items were failed the MA was increased on the average by two 
months, the range of increase being from zero to fourteen months. 
Increases in mental age were found in forty-two per cent of the cases. 
In the group which showed increases, in MA the average increase 
was three and two-tenths months. 
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REVISED IOWA HAND USAGE DEXTRALITY 
QUOTIENTS OF SIX-YEAR-OLDS* 


WENDELL JOHNSON AND DARLENE DUKE 


University of Iowa 


I. INTRODUCTION 


In this paper Revised Iowa Hand Usage Test dextrality quotients 
(DQ) for fifty six-year-old children are presented and compared with 
those previously reported by Johnson and Davis? for one hundred 
seven-year-olds. These scores were obtained by means of the same 
test as that used by Johnson and Davis in the above mentioned 
investigation, which test is a revised form of the original lowa Hand 
Usage Test devised by Johnson and Duke.' In our earlier paper 
concerning the original form of the test, we reported hand usage DQ’s 
for fifty six-year-olds. Since these early scores for six-year-olds are 
not directly comparable with the DQ’s for seven-year-olds obtained by 
means of the revised form of the test by Johnson and Davis, we have 
collected the DQ’s here reported in order that a more valid comparison 
of six- and seven-year-old children might be made with regard to hand 
usage as here measured. 


Il. SUBJECTS AND PROCEDURE 


The test used in this study has been described in detail by Johnson 
and Davis.? The test involves the performance of thirty-two activi- 
ties, each activity being performed twice during the test, and the hand 
used in performing each operation being noted. The activities were 
selected in the main from a list of one hundred twenty-five kinds of 
unimanual activity noted in the course of eleven hundred fifty-three 
observations of fifty six-year-old children in the classroom. A detailed 
account of the construction, validation, standardization, and revision 
of the test is to be found in the previous reports mentioned above, 
which were published in the 1936 and 1937 volumes of this JouRNAL. 

The test items are here listed in the order in which they are per- 
formed by the subject. It will be noted that there are thirty-two 
activities and sixty-four performances, since each activity is performed 
twice. Copies of the list may be made, and opposite each item the 
tester may record whether the activity was performed with the right 





* From the Speech Pathology Laboratory of the University of Iowa and the 
Iowa Child Welfare Research Station. 
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hand, left hand, or with both hands, neither hand predominating. 
The detailed instructions published by Johnson and Davis? are to be 
used in conjunction with the following list: 


1. Pull down curtain. 33. Pick up card. 
2. Pull down curtain. 34. Pick up card. 
3. Take article from desk. 35. Lay down card. 
4. Take article from desk. 36. Lay down card. 
5. Tear paper from tablet. 37. Tear paper from tablet. 
6. Turn over paper. 38. Turn over paper. 
7. Sharpen pencil. 39. Fold paper. 
8. Write with pencil. 40. Sharpen pencil. 
9. Erase with pencil-tip eraser. 41. Turn page of book. 
10. Write with pencil. 42. Turn page of book. 
11. Draw a picture. 43. Draw a picture. 
12. Put pencil in desk. 44. Put pencil in desk. 
13. Take crayon from box. 45. Pick up scissors. 
14. Take crayon from box. 46. Put scissors in desk. 
15. Color with crayon. 47. Pick up pen. 
16. Color with crayon. 48. Take top off pen. 
17. Put crayon in box. 49. Take top off ink bottle. 
18. Put crayon in box. 50. Fill pen. 
19. Close crayon box. 51. Write with pen. 
20. Close crayon box. 52. Put top on pen. 
21. Pick up scissors. 53. Erase with pencil-tip eraser. 
22. Put scissors in desk. 54. Fold paper. 
23. Point to block. 55. Pick up pen. 
24. Point to block. 56. Take top off pen. 
25. Pick up block to pile. 57. Take top off ink bottle. 
26. Place block on pile. 58. Fill pen. 
27. Pick up block to pile. 59. Write with pen 
28. Place block on pile. 60. Put top on pen. 
29. Pick up block to carry. 61. Pick up chalk. 
30. Pick up block to carry. 62. Write with chalk. 
31. Point to card. 63. Pick up chalk. 
32. Point to card. 64. Write with chalk. 


In the present study the test was given twice to each child in order 
to make possible the computation of test-retest reliability. The 
interval between the first and second tests varied from one day to 
one week. 

The test is scored in terms of the dextrality quotient. ‘The DQ 
is to be defined as the percentage of the total achievement involved in 
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any test of handedness which is to be credited to the right hand.”! 
The formula for obtaining the DQ is a+ in which R represents 
the number of performances in which the right hand predominates, 
and B the performances in which both hands take part, neither 
predominating. WN is the total number of performances. The formula 
can be used for scoring any test of handedness and thus by means of it 
all handedness measures are rendered directly comparable, a long- 
needed and basic advantage in this field of study. 

The children tested in the present investigation were all drawn from 
the Grand Rapids, Michigan, public schools. All cases of known 
mental deficiency, speech defects, and physical abnormalities which 
might afi + hand usage were excluded. Sixteen of the subjects were 
boys and th..ty-four were girls. They ranged in age from six years, 
eleven days, to six years, eleven months, fifteen days. The median 
age was six years, six months, twenty-three days. 

While the scores to be reported in this paper were obtained by 
means of the procedure indicated above, the testing was so done that 
the performances could be scored in terms of the original Johnson-Duke 
test. This was done in order to check further the advantages or dis- 
advantages of the revised form of the test. The manner of administer- 
ing the test which made it possible to score it in two ways was that 
followed by Johnson and Davis? and described by them. 


Ill. RESULTS 


Reliability —DQ’s were computed for each child for the first test 
(7) and the retest (7.). The coefficient of correlation between 7’, 
DQ’s and T, DQ’s was .90 + .018 (scored according to the revised 
form of the test). When scored according to the old form of the test 
the coefficient of correlation between 7’, DQ’s and 7, DQ’s_was 
£81 + .033. These coefficients corroborate the Johnson-Davis findings 
in showing that the revised form of the test is the more reliable. 
Johnson and Davis reported that the test-retest reliability of the 
revised form of the test for one hundred seven-year-olds was .91 + .012. 

A further check on the reliability of the test was made by computing 
the difference and the standard error of the difference between the DQ’s 
of each child for 7; and T:. Again, scores obtained in terms of the 
original and the revised forms of the test, respectively, were compared. 
Considering as significant any difference which is three or more times 
larger than its standard error, no child presented a significant difference 
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between 7’; and 7’; in terms of the revised test; three children had 

















i significant differences in terms of the unrevised test. Thirty-four of sh 
the children had lower critical ratios in terms of the revised test than in th 
terms of the old test. Again, these figures indicate that the revised ne 
form of the test is the more reliable. The test may be regarded as to 
having a comparatively high reliability. ine 

Tentative Norms for Sizx-year-olds Based on Fifty Cases.—Tentative as 

norms are given in the following tabulation for the fifty six-year-olds ha 

tested in this study. The Johnson-Davis norms for one hundred mi 

7) seven-year-olds are given for purposes of comparison. , to 
pai it | 

Percentile Six-year-olds DQ’s Seven-year-olds DQ’s to 

ld 
| i 4 0 07 12 pe 
aw 10 52 63 “ 
q: 20 65 71 th 
7 30 .74 .78 sc] 
40 .80 .81 
50 .82 .83 th 
60 .84 .85 . 
Bro 70 86 88 P, 
it's hy 80 .90 .92 tie 

a 90 94 95 pe 

; 100 1.00 1.00 ers 

ata re’ 
oe The 7; DQ’s for the Revised Iowa Hand Usage Test, for both for 
be) groups of children, were used in the above tabulation. The 0 and rig 
tf ip 100th percentiles, respectively, represent the lowest and the highest ver 
7 4 DQ in each group. sh 
ips It will be noted that the median DQ for the six-year-olds was .82 th 
] and the median for the seven-year-olds was .83. With regard to the ing 
six-year-olds, the median score for the sixteen boys was .81, and the lea 

median for the thirty-four girls was .82. For the seven-year-olds, cel 

the median was .82 for the boys and .85 for the girls; there were fifty pe 

boys and fifty girls. scl 

In their report, Johnson and Davis presented a graph for their 

seven-year-olds. It was essentially a single mode curve piled toward is | 

the right-hand side of the distribution. Obviously, a curve of the of 
: distribution of DQ’s for our six-year-olds would be of the same type. de; 
, This was true, also, of the distribution curve of the DQ’s of our original eig 
group of fifty six-year-olds with whom the unrevised form of the test dei 


was used.! 
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Effect of Age on Hand Usage.—The entire distribution of scores 
shifts slightly toward the right-hand end of the scale when we go from 
the six-year-level to the seven-year-level. This shift is practically 
negligible in the upper half of the distribution. It is more evident 
toward the left-hand end of the distribution. ‘Thus, these studies 
indicate that the more left-handed a child is in terms of hand usage, 
as here measured, the greater will be his shift in the direction of right- 
handedness upon passing from the six- to the seven-year level. One 
might express these facts in a somewhat different statement—subject 
to any change dictated by other and future research—by saying that 
it is the rule rather than the exception for such “left-handed” children 
to experience more or less ‘‘change of handedness.”” These two 
statements could be most adequately checked by means of genetic 
studies. Further comments in this connection are to be reserved for 
the forthcoming report of a study of the hand usage DQ’s of high- 
school-age subjects. 

Consistency in Hand Usage.—Incidental observations were made of 
the consistency with which one hand or the other was used for highly 
similar activities. For example, it will be noted that six of the activi- 
ties in the test involve some kind of writing implement: Writing with 
pencil, writing with pen, writing with chalk, coloring, drawing, and 
erasing with pencil-tip eraser. A check of the individual records 
revealed that all of the fifty subjects were consistent in hand usage 
for these six activities (if a given child performed one of them with the 
right hand, he performed all of them with the right hand, and vice 
versa). If this were to hold true in other studies, it would justify 
shortening of the test by eliminating five of these items. Meanwhile, 
they should be retained until more studies have been done. The find- 
ing from the present study does indicate that for six-year-olds, at 
least, writing is a unitary function as far as hand usage is con- 
cerned, regardless of the writing tool that may be used—whether pen, 
pencil, chalk, or crayon—and regardless of whether the child writes 
script, draws a picture, colors, or even uses a pencil-tip eraser. 

The above finding may be due to the fact that the writing function 
is taught in a relatively formal way. At least, when we consider some 
of the other activities in the test, not formally taught to any significant 
degree, the same tendency is not found. There are five ‘‘put” and 
eight “take” operations. The “put” operations are putting pencil in 
desk, putting scissors in desk, putting crayons in box, placing blocks, 
and laying down cards. A check of the individual records showed that 


- s 
= 
Geol ict 
atk Tae 
ay to Bet 
ca 
nes 4 
ae ae 
+f! 7. 
es 
i pe 
, 18 





ENT EC ee 
- 




































~ wees hi Lasempet aE 4. 


50 The Journal of Educational Psychology 


only seventeen of the fifty children were consistent in the hand used 
for these five activities. Even with regard to putting pencil in desk 
and putting scissors in desk, fifteen of the children used one hand for 
one of these operations and the other hand for the other, or at least 
used the hands differently for them (since two performances of each 
operation were made, it was possible for a subject to perform an opera- 
tion both times with the right hand, or both times with the left hand, 
or once with the right and once with the left, or even to complicate the 
record by using both hands more or less equally for one or both per- 
formances). With regard to placing blocks and laying down cards, 
sixteen children performed inconsistently. 

The eight “‘take” operations were taking articles from desk, taking 
crayons from box, picking up scissors, picking up blocks to pile, 
picking up blocks to carry, picking up cards, picking up chalk, and 
picking up pen. With regard to all eight operations, considered 
together, only five children were consistent in hand usage, although 
the operations are quite similar in that they all involve prehension. 
If the six “picking up” items are considered, only twelve of the 
children performed consistently. In picking up blocks to pile and 
picking up blocks to carry, one might tend to assume that there would 
be very little inconsistency in hand usage, but as a matter of fact 
seventeen of the children were inconsistent. Perhaps the most reveal- 
ing fact in this connection was the following: It will be remembered 
that all the children were consistent in hand usage as between writing 
with a pen and writing with chalk. Nevertheless, as between picking 
up the pen and picking up the chalk, twenty of the fifty children 
were inconsistent with regard to the hand used for the “picking 
up.” 

Here would seem to be evidence indicating that hand usage prac- 
tices tend to be consistent (involve the same hand) in proportion to the 
degree that their development is formally taught and supervised. In 
other words, the findings cited in this section appear to indicate that 
insofar as handedness is consistent, in terms of hand usage as here 
measured, it is to a considerable degree “learned” rather than 
“inherited.” 


IV. SUMMARY AND CONCLUSIONS 


The Revised Iowa Hand Usage Test was administered to fifty 
six-year-olds, sixteen boys and thirty-four girls. Children with speech 
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defects, known mental deficiency, and physical conditions which might 
affect hand usage were excluded. The test is scored in terms of the 
dextrality quotient, DQ, which is to be defined as the percentage of 
achievement in any test of handedness which is to be credited to the 
right hand. The test-retest reliability of the test as determined in this 
study is .90 + .018, and as determined by Johnson and Davis? in 
testing one hundred seven-year-olds it is .91 + .012. 

The DQ’s of the fifty six-year-olds tested are not distributed in a 
clearly bimodal fashion, but rather in a skewed and essentially uni- 
modal manner. The median DQ for the six-year-olds is .82. This 
means that in terms of the test here used, and with reference to the 
children tested, the median six-year-old is eighty-two per cent right- 
handed as far as hand usage is concerned. Comparison of the present 
six-year-old norms with the seven-year-old norms of Johnson and Davis 
indicates that in terms of the test and the subjects here considered, 
degree of right-handedness increases slightly from the sixth to the 
seventh year. 

Without attempting here to refer specifically to any one of the large 
number of studies that have involved measures of handedness, we want 
to register the judgment that handedness has been shown to be of such 
an extremely low order of differentiation as to make the word quite 
useless as a technical term. The studies in the present series bear this 
out definitely. To ask whether a given individual is right-handed or 
left-handed is to come close, at least, to asking a factually meaningless 
question. One would be justified in answering: ‘“‘Do you mean when 
picking up a pen or when picking up a piece of chalk?” (See the fore- 
going section on “‘consistency in hand usage.’”’) The term “handed- 
ness’’ would appear to make sense only when relatively gross differ- 
entiations among individuals are required. Moreover, the old assump- 
tion that handedness is distributed bimodally can not be evaluated 
except in terms of some standardized operation of measurement. This 
series of studies has shown that for six- and seven-year-olds, hand 
usage, as here measured, is not bimodally distributed. Therefore, the 
studies do not justify the policy of classifying children of these age 
levels in the two categories, left-handed and right-handed. At least, 
if these two categories are to be regarded as the extremes of the 
distribution, then apparently the majority would have to be classified 
as ambidextrous. It goes without saying that any theories concerning 
the relation of handedness to other functions (such as speech defects, 
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for example) stand to be evaluated with reference to the present find- 
ings and other related data previously reported. 
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THE VALIDITY OF THE NOLL TEST 
OF SCIENTIFIC THINKING 


GLENN M. BLAIR 


University of Illinois 


One of the most important functions of the school should be to 
develop in children the ability to think clearly and to face problems in 
an unprejudiced manner. Children should learn early to discriminate 
between fact and opinion, and should develop the habit of suspending 
judgment when data at hand do not warrant the formation of a con- 
clusion. Ina word, children should learn to exhibit a scientific attitude 
toward the problems of life. Since this is one of the commonly 
accepted goals of education, some means should be devised for ade- 
quately measuring improvement in this trait. 

The writer recently became interested in studying the extent to 
which high-school freshmen exhibit a scientific attitude toward prob- 
lems, and also the extent to which this trait can be improved by proper 
teaching. He then cast about to find a suitable measuring stick of the 
scientific attitude. The only available standardized instrument which 
purported to measure this phase of conduct was the Noll test—‘‘ What 
Do You Think?, A Test of Scientific Thinking.” This test, which has 
been constructed for use in grades 7-12, defines the scientific attitude 
in terms of the following six habits of thinking: (1) habit of intellectual 
honesty, (2) habit of looking for true cause and effect relationships, 
(3) habit of suspended judgment, (4) habit of open-mindedness, 
(5) habit of criticism, including self-criticism, and (6) habit of 
accuracy in all operations, including calculation, observation, and 
report. 

The Noll test consists of two forms—Form 1 and Form 2. Each 
form is made up of one hundred twelve items. To illustrate the type 
of items that are used, some examples are given below: 


(1) There is no use working any harder than is necessary to “get by.” 

(2) A high forehead is a sign of intelligence. 

(3) Our next president will be a Democrat. 

(4) Any nation that persecutes the Jews, as Germany has recently done, 
must be totally uncivilized. 

(5) If my teacher says a thing is so, it must be so. 





1Published by the Bureau of Publications, Teachers College, Columbia 
University, New York, 1935. 
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Item 1 above is supposed to measure intellectual honesty; item 2, 
habit of looking for cause and effect relationships; item 3, habit of 
suspended judgment; item 4, habit of open-mindedness; and item 5, 
habit of criticism. The trait of accuracy is measured by having the 
subject answer some questions about a rather complicated diagram, 
and by having him arrange some words according to alphabetical order. 
The instructions call for statements believed to be true to be marked 
with a ‘‘7’,” those that are false to be marked with an ‘“‘F,”’ while those 
statements which can neither be marked true or false are to be desig- 
nated with a “?.”’ 

Although the Noll test was selected as being the most suitable test 
of its kind available, the writer decided it would be well to check its 
validity before giving it any wide use. 

If the test really measures scientific thinking, it is the opinion of the 
writer that well-known scientists should not only make high scores on 
the test, but should also agree very closely among themselves as to the 
correct answers to the various questions. If scientists cannot answer 
the questions correctly, it would seem doubtful whether it should be 
called a test of scientific thinking. Thus it is assumed that first-rate 
scientists possess the scientific attitude and the ability to do scientific 
thinking, if such a trait exists. 

Consequently, Forms 1 and 2 of the test were given to sixteen 
scientists on the faculty of the University of Illinois, and they were 
requested to ‘‘take the tests” if they had time to do so. Each of the 
sixteen men filled in completely both Forms 1 and 2 of the test. Every 
one of these scientists holds the Ph.D. degree, and has a rank of asso- 
ciate professor or higher in the University. The branches of science 
represented by the group are: bacteriology, botany, chemistry, 
entomology, physics, psychology, and zoology. A number of the men 
in this group are national and international authorities in their respec- 
tive fields. 

The purpose of this paper is to report the results of giving the Noll 
Test of Scientific Thinking to the sixteen scientists who have just been 
described. 

In Table I, the distributions of scores made by the scientists are 
given. 

It can be observed from Table I that none of the scientists gave 
answers that agreed in every respect with the answer key, since a 
perfect score on either form of the test would be 112. The average 
score made by the scientists on Form 1 is 93.1, while on Form 2 the 
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average score is 87.1. The results do indicate, however, that the 
scientists do somewhat better on the tests than school children. 
According to the published norms, the median score made by twelfth- 
grade children is 81 for both Formsland2. [Thirty-two college juniors 


and seniors who took these tests under the direction of the writer, made 


an average score of 89.8 on Form 1, and 86.7 on Form 2.] 


TaBLeE I.—DistTrisvuTion or Scores Maps By SrxTEeEn SCIENTISTS ON THE NOLL 
Test oF Screntiric THINKING 











Score Form 1, F Form 2, F 
100-104 3 1 
95-99 8 1 
90-94 1 7 
85-89 2 2 
80-84 0 2 
75-79 1 0 
70-74 1 3 
N = 16 N = 16 











Average score on Form 1, 93.1; on Form 2, 87.1. 


The most striking fact observed from the obtained data, however, 
is the extent to which the scientists disagree with certain of the answers 
of the standard scoring key. The first seventy-five questions of each 
form of the test are given over to the measuring of the habit of intel- 
lectual honesty, habit of looking for cause and effect relationships, 
habit of suspended judgment, habit of open-mindedness, and habit of 
criticism. In Table II, an analysis is made of each of these seventy- 
five items to note where the answer key and the responses of the 
scientists are at variance. 

From a glance at this table it can be seen that in numerous instances 
the scientists do not agree with the published answers. For example, 
in the third question of Form 2, not a single scientist marked his test 
in the correct manner according to the answer book. If a school child 
taking this test should mark this question in exactly the same way as 
did every one of the sixteen scientists, his question would be marked 
wrong and he would be considered unscientific in his thinking. This 
particular item reads as follows: ‘‘ Machinery will eventually displace 
manual labor completely.’”’ The scoring key gives the only acceptable 
answer as being ‘‘?,”’ but every one of the sixteen eminent scientists 
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* This signifies that fifty per cent of the scientists answered question 1 of Form 1 


in the same manner as is found in the scoring key. 
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marked this statement “F.’’ There are also two other items on 
Form 2, items 5 and 10, where not a single scientist gave the approved 
answer, and yet these items as regularly scored are supposed to measure 
scientific thinking. 

If it were held that for an item to be considered valid, at least 
three-fourths of the scientists should agree with the answer key, there 
would be twenty-six invalid items in Form 1 and twenty-five invalid 
items in Form 2. 

A revision of the scoring key which would make it agree with the 
responses given by the sixteen scientists would no doubt increase the 
validity of the test considerably. The published scoring key, according 
to the manual of directions,' is based on the responses of a small group 
of experts. It is not stated in the manual how small the group of 
experts was or just who they were. It is possible that this was too 
small a group with which to establish with any degree of certainty the 
appropriate answers to the various statements. 


REVISED SCORING KEY 


Using the data obtained in this investigation, it would thus be 
possible to revise the scoring keys of the Noll test. For example, if a 
majority of the sixteen scientists, nine or more, agree with the answers 
of the original key for a given item, no change in the key need be made. 
New answers could be given for those items where a majority of the 
scientists agree upon an answer which differs from that given in the 
regular key. Items upon which there is no clear-cut majority response, 
should no doubt be dropped from the test altogether. 

Using this procedure, new answers would be necessary for three 
questions of Form 1, and fourteen questions of Form 2. Furthermore, 
eight questions of Form 1, and four questions of Form 2 would be 
eliminated from the test, since there is no agreement among the 
scientists as to proper answers to these questions. 

Table III indicates the changes in answers which should be made in 
the original scoring key, while Table IV shows which questions should 
be eliminated entirely. 

Although this procedure leaves much to be desired, it should give 
the test more validity than it now possesses, since school children would 
no longer have answers marked wrong which agree in every respect 





1 Noll, Victor H.: The Habit of Scientific Thinking: A Handbook for Teachers, 
Bureau of Publications, Teachers College, Columbia University, 1935, p. 22. 
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TaBLE III.—Revisep ANswERs FOR THE Nou Test or ScrentiFic THINKING 
Question | Original | Revised | Percentage of scientists giving 
number | answer | answer revised answer (N = 16) 
Form 1......... 35 wy be 81.3 
46 ed i she 62.5 
60 6 9? 66 pee 81.3 
Form 2......... 1 — _" 68.8 
3 6p? oc epee 100.0 
5 “ ?”? és pa 100.0 
x “cc Pe 66 pas 56.3 
10 oy «es 87.5 
20 cc T” 66 pps 68.8 
37 ii) said “cc arr 62.5 
39 66-pas “cc ares 62.5 
42 “)- el 62.5 
52 “ee ioe 87.5 
56 “ce y hte 66? 68.8 
61 a —" 81.3 
63 ee x! -— 87.5 
67 6¢ pap 66 pee 56.3 

















Tas.Le [V.—Quvuestions Wuicn SHovutp Br ELIMINATED FROM THE NOLL TEST OF 
ScrentiFric THINKING 








Question — i. Answers given by the scientists 
number ers me (N = 16) 
oll key 
Form 1...... 1 a us (“T” 1, “F” 6, “2?” 8, not answered 1) 
4 “ec P* = y beh 3, 66? 5, cé ?”? 8) 
7 “cc ?? ‘e y beh 2, 66 a? 8, “cc ?” 6) 
15 “é 7° = y heh 1, 6? 6, “cc ?”? 7, not answered 2) 
21 “ec 7” ~ qe 6, 66? 6, “cc ?” 4) 
30 sé 7 . T”’ 7, 6p? 3, cc 7” 6) 
48 ae hd es oes 7, 66 2? 8, 66 pee 1) 
54 “ 7 er 3, ae ed 5, “ec ?”? 8) 
Form 3...... 43 wel ag (“T” 8, “F” 2, “?” 5, not answered 1) 
57 ae (“T”’ 6, “F” 8, “2?” 1, not answered 1) 
70 66 oyna ‘soe seed 8, “RA” a 66 999 1) 
74 “cc Pr” we 3, 66 op 9 6, “ec ?”? 7) 














with those given by a majority of the scientists who took the tests. It 
would, of course, be better to include in the revised test only items 
upon which there was at least seventy-five per cent agreement on the 
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part of the scientists. This would, however, necessitate the elimina- 
tion of so many items that the tests would suffer greatly because of 
inadequate length. 

A word, perhaps, should be said about questions 75-112 of the Noll 
test which deal with the habit of accuracy in calculation and observa- 
tion. As these questions deal with facts rather than opinions, there 
was virtually no disagreement with the Noll answer key on the part of 
the scientists who took the tests. These items should, therefore, 
continue to be scored in the future according to the regular key. 


SUMMARY 


The Noll test of scientific thinking has been devised to measure the 
extent to which school children can take an objective, scientific view- 
point toward a variety of problems. In brief, it purports to measure 
the scientific attitude. 

In order to check its validity, this test was given to sixteen top 
ranking scientists on the faculty of the University of Illinois. Each of 
these men holds the Ph.D. degree, and has a rank of associate professor 
or higher in the University. The branches of science represented by 
these scientists are: Bacteriology, botany, chemistry, entomology, 
physics, psychology, and zoology. 

The results of the investigation show that for many items of both 
forms of the test, the answers given by the scientists disagree with 
those given in the published scoring key. Several questions are found 
for which none of the sixteen scientists give acceptable answers accord- 
ing to the Noll key. There are twenty-six invalid items in Form 1 and 
twenty-five invalid items in Form 2, if it be held that for an item to be 
considered valid it must be answered according to the scoring key by 
three-fourths of the scientists. These facts cause one to question the 
validity of this test as a measure of scientific thinking. 

It is suggested, however, that the validity could be improved con- 
siderably by revising the scoring key so that it would fall in line with 
the answers of the group of scientists who took the tests. In a number 
of questions there is almost complete disagreement among the scientists 
as to what are the correct answers. It is suggested that these questions 
be eliminated from the tests altogether. 
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ON THE USE OF THE INTRA-CLASS CORRELATION 
COEFFICIENT IN THE TESTING OF THE 
DIFFERENCE OF CERTAIN 
VARIANCE RATIOS* 


E. M. JELLINEK 


Samples of measurements of biological functions frequently are 
heterogeneous to a marked degree. This is particularly true in the 
case of human biology, where a sample may consist of several measure- 
ments made on each of a number of individuals. Each individual may 
constitute a population in himself, or at any rate the individuals belong 
to several populations. Thus, in these cases, a sample consisting of 
k random readings for each of n’ individuals is not drawn randomly 
from one population, but may have been drawn from several up to n’ 
different populations. 

The presence of this heterogeneity generates many problems in the 
application of statistical methods to human biology. These problems 
are too manifold to consider in a brief note. I shall restrict the 
discussion to the case where the variable under consideration is 
normally distributed, and where the heterogeneity of the data arises 
only from the fact that the sample has been drawn from populations 
differing in their means. Heterogeneity, in this sense, relative to the 
variable zx, may be determined by analyzing the “Total” variance 
of the kn’ readings into two components, ‘‘ Between Individuals’’ 
variance with (n’ — 1) degrees of freedom, and “‘ Within Individuals” 
variance with n’(k — 1) degrees of freedom. (1) If the ‘‘ Between 
Individuals”’ variance is a significant multiple of the ‘‘ Within Indi- 
viduals” variance, that is, if the ratio, which we may designate as 
F?, is significantly different from unity, the group of n’ individuals is 
heterogeneous, for the means of individuals differ more from each 
other than an individual differs within himself. In determining signifi- 
eance we must, of course, take some lower bound of the probability, 
as .05. 

While establishing the presence or absence of heterogeneity within 
@ given group is in itself of great importance to the investigator, he 
might be further interested in knowing whether or not the heterogene- 
ity of a certain group of marked members is greater than that of a 
group of unmarked members. The variation “ Between Individuals”’ 





*From the Memorial Foundation for Neuro-Endocrine Research and the 
Research Service of the Worcester State Hospital. 
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measures how much the individuals differ from each other and it might 
suffice to compare these measures for the two groups. In that case, 
however, we merely can assert that one group is more compact than the 
other. Heterogeneity is not implied in that statement, since there 
may be homogeneity in the group in spite of a large ‘“‘ Between Indi- 
viduals” variance if the ‘‘ Within Individuals” variance is also large. 
If we wish to compare the heterogeneity relative to the variation 
existing ‘‘ Within Individuals,’”’ we must compare two F ratios, or some 
legitimate functions of them. The latter viewpoint is adopted here. 
A precise comparison of the heterogeneity of two or more groups of 
data may contribute greatly to the interpretation of the data. The 
problem of comparing heterogeneity is not peculiar to statistical 
methodology applied to biological data; the problem may also arise 
in sociology, education, economics, etc. 

In order to compare precisely two F ratios we would require the 
distribution of their difference or of their ratio. Both of these dis- 
tributions appear to be exceedingly complex and have not been 
evaluated thus far. I should like to offer, however, an indirect test 
for the significance of the difference in the two ratios. This test is 
based on the following considerations: 


“Between Individuals” Sum of Squares/n’ — 1* 


" = =Within Individuals” Sum of Squares/(k — 1)n’ 





where the Sum of Squares is the numerator of the variance. The 
intraclass correlation coefficient, denoted here as r’, may be expressed 
as 
roe k(‘‘ Between” Sum of Squares) — ‘‘Total”’ Sum of Squares 
(k — 1) (“Total’”’ Sum of Squares) 





Thus 7’ can be written in terms of F as follows: 


a (n’ —1)F —n’ 
~ (n' — 1)F + n'(k — 1) 





Tr 


For a given number of readings per individual, k, there is thus a 
constant relationship between F and r’. The problem of testing the 
significance between two such F ratios may, therefore, be restated in 
terms of the difference between the corresponding r’’s. This latter test 





* F is generally the larger variance over the smaller variance. For the present 
problem, however, we must define F as above, since the problem obtains only if the 
“Between”’ variance is larger than the “ Within”’ variance. 
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is readily performed. Transforming 7’ into the almost normally 
distributed variable z' 


z= 





1 1+ (k—- vf] 
2 oe. i-? 
Then, approximately, 

k 


2(k — 1)(n’ — 2) 


2 





Cz 


and 
o "at, eels *, +o *, 
For the special case, k = 2, 
al 
n' — % 
and z is much more closely normally distributed. For k > 2, it is 


necessary that n’ be large before the test can legitimately be applied. 


Then complete application of the test requires that the relative deviate 
21 22 


o,;. = 





be referred to tables of the normal probability integral. 


Os,—2, 


The use of the test may be illustrated by the following example: 
The repetitions of the erythrocyte count were made on each of ninety- 
three dementia praecox patients and on each of thirty normal controls. 
Analysis of variance yielded the following values: 


ERYTHROCYTES IN Units oF 10,000 














Patients Normal controls 
Source of variation 
Degrees of : Degrees of . 
Variance Variance 
freedom freedom 
Between Individuals............ 92 3831 29 2003 
Within Individuals.............. 93 607 30 727 
a 185 2211 59 1354 
F =6.31 | P < .01 | F =2.76 | P <.01 

















Thus both groups are significantly heterogeneous. Then r’ = .724, 
z= .916, 7’ = .455, z = .491. 

Consequently, our relative deviate is .425/.215 = 1.98 and 
P < .048. We conclude that dementia praecox patients are probably 
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significantly more heterogeneous relative to the erythrocyte count than 
are normal controls. 

When k is not constant from individual to individual, the testing 
of a single group to determine heterogeneity is again performed by the 
use of the F ratio. However, the conversion of F into r’ is no longer 
exact. 
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ARE SCORES INCREASED ON OBJECTIVE TESTS BY 
CHANGING THE INITIAL DECISION? 


F. K. BERRIEN 
Colgate University 


In administering objective type examinations students are some- 
times counselled to answer according to their initial impression of an 
item. The purpose of the present study was to test the wisdom of 
such advice. 

Studies of this problem by Lehman,' Mathews,? Lowe and 
Crawford’ agree that between fifty and seventy-five per cent of the 
students who make one or more changes on objective-type items, 
produce a net increase in their total scores by such changes. Lehman 
and Mathews present further evidence showing that superior students 
generally make fewer changes and are more likely to raise their grades 
by those changes than inferior students. Lehman cites an unpublished 
study by J. P. Porter confirming his results. 





























TaBLe I 
Class A | Class B | Class c |¥ehman's 
data 

N Per N Per N Per N Per 
cent cent cent cent 

Students who by changing answers— 
SIR, cco cececccdaseeees 16 | 25) 13 19; 5! 23| 14); 2 
ESCA 38 | 58 | 44| 66/)17| 77|20)| 40 
failed to alter score............. 7; 11]; 8] 12/;00; 00; 5} 10 
Students making no change........| 4 6; 2 3/00; 0O/| 11); 22 
retire b0b avacw dae eee 65 | 100 | 67 | 190 | 22 | 100 | 50 | 100 

it 














The present report in based on approximately twenty-eight thou- 
sand test responses made by one hundred fifty-four students divided 
between three classes. Classes A and B had registrations of sixty-five 
and sixty-seven, respectively, and were taking courses in elementary 
psychology although the subject-matter and consequently the examina- 
tions were not identical. Class C consisted of twenty-two students 
taking a course in child psychology. The examinations in all three 
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classes consisted of true-false and multiple-choice items divided in the 
ratio of about two to one, respectively. 

Previous to the examination day the students were informed of the 
general purpose of the study and were asked merely to cross out rather 
than erase any answer which “on second thought” they wished to 
change. They were assured that only the corrected answer would 
count in determining their test score. These instructions were given 
in advance to avoid confusion which might arise from suddenly 
introducing special instructions at the last moment. The directions 
were of course repeated at the time of the examination. 

Gross results of the present study and data reproduced from 
Lehman’s report are given in Table I. 

The data presented by Mathews, Lowe and Crawford are not 
reducible to Table I. However, their data cannot be interpreted as 
contradicting these findings. It is quite obvious that students in 
general are more apt to raise than lower their grades by changing 
their answers on doubtful questions. 

However, a further question is immediately raised. What about 
the students—constituting about one-fourth of the classes—whose 
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grades are lowered by changing the doubtful answers? Can they be 
identified in advance of the examination? Are they in a given part of 
the class distribution? 

Lehman, supported by Porter and Mathews, maintained that such 
students are more likely to be found at the low end of the class and, 
moreover, are the ones who make the most changes. Our data do not 
support these conclusions as shown in Table IT based only on classes A 
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and B. (The number of students in class C was too small to warrant 
analysis in this respect.) 

Examination of this table shows that contrary to what we should 
expect on the basis of the data previously published, the greatest 
number of changes do not always occur in the lower portion of the 
class. Class A shows the greatest average number of changes in the 


highest hexile. Class B comes nearer agreeing with our expectations, 


showing the greatest number of changes in the second hexile, although 
the lowest and highest hexiles are equal in this respect. 

This table, moreover, does not indicate that superior students as 
measured by the final score are necessarily superior when they change 
‘their minds about a given answer. It happens that the greatest per 
cent of correct changes does occur in the highest hexile of class A, but in 
only the fourth hexile of class B. The lowest per cent of correct 
changes occurs in the second hexile of class A and in the first hexile of 
class B. There seems to be some slight difference in the percentage of 
correct changes when one compares the upper and lower halves of the 
classes, but such a difference is unreliable. These data certainly 
provide no justification for advising the low students to stick to their 
first impressions in order to raise their grades. 

The question now arises: What factors may have operated to 
produce the differences in findings which have just been noted? Two 
possibilities present themselves. 

In the first place, a selective factor may have operated in the group 
studies at Colgate. Such a factor seems unlikely, since the gross 
results presented in Table I are in agreement with other studies. At 
best, such a factor would be exceedingly difficult to demonstrate. 

In the second place, the difficulty of the test items may have some 
influence on test behavior. If an examination were relatively easy, 
then few good students would be doubtful of their first answers, while 
the poorer students would probably change more of their first responses. 
If the test items in general were relatively more difficult, then the good 
students would be more apt to recognize subtleties on reconsidering 
questions. They would, therefore, make more thanges. The niceties 
of meaning on which many objective questions hinge might escape the 
poorer students to such an extent that doubt as to their first response 
would never arise. Hence changes would be fewer in such a group. 

This hypothesis is strengthened somewhat when one considers that 
the examination for class B was constructed so that it would be rela- 
tively easy. Subtle questions were omitted as much as possible. 
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Data in Table II support our hypothesis, since the lower students in 
class B which had an easy examination made more changes than similar 
ranking students in class A. 

This study demonstrates at least two points: First, in harmony with 
the previous studies in this area, students in general are more likely to 
raise than lower their scores by changing answers on doubtful items. 
Second, probably no consistent relationship exists between an indi- 
vidual’s standing in his class distribution and his success in correcting 
doubtful answers. Furthermore, it is possible that the difficulty of the 
test may determine whether the inferior or superior students are more 
likely to make changes in their answers. 
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THE EDUCATIONAL VALUE OF STAMP-COLLECTING 


DONALD E. SUPER 
Clark University 


It has often been asserted by philatelic enthusiasts that postage 
stamp collecting has great educational value: It results, they claim, in 
increased knowledge of geography, history, international affairs, and 
economics. It not infrequently happens that men who collect stamps 
themselves encourage others to take up the hobby for this reason, 
and school people are occasionally urged to sponsor the development 
of collecting through their classes or clubs. 

This study has as its objective the determining of the actual effect 
upon educational achievement, in related areas, of stamp-collecting. 


METHOD 


The Codéperative General Achievement Test, Form O, Part I, 
Survey of the Social Studies, was selected as an instrument suitable 
for measuring significant social knowledge in a number of areas, and 
the Otis Self-Administering Test of Mental Ability, Higher Form A, 
as 2, measure of scholastic aptitude. — 

These instruments were administered by the writer to one hundred 
twenty-two juniors and seniors in the high schools of Worcester 
County, Massachusetts, in connection with the high-school testing 
program of the Clark University Guidance Service. At the same time 
each student filled out the Student’s Inventory used in that service, 
to which the following questions had been added: 


Have you ever collected postage stamps? Yes_ No. If so, give date 
of starting: 19__; and date last active as a collector: 19__; or, still active. 


FINDINGS 


The means and standard deviations of the scores on the Social 
Studies Test were calculated for two groups, those who had at some 
time collected stamps and those who had never collected. The mean 
standard scores were 59 for the collectors, sigma being 9.22, and 60 
for the non-collectors, sigma being 9.78. This difference of one stand- 
ard score point in favor of the non-collectors, divided by its standard 
error, gives a ratio of .59, which is clearly not statistically significant. 

It might be that the failure to find a higher degree of world knowl- 
edge in the collecting group is due to differences in the mental ability 
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of the two groups, enabling the non-collectors to become as well- 
informed as the collectors because of superior mental ability. The 
mean IQ’s of 114 for the collectors and 115 for the non-collectors, 
D/ca being .51, indicate that this is not the case: the groups are of 
approximately equal ability and are equally well-informed. 

As one might question the validity of our assumption in grouping 
together all those who at some time collected stamps; namely, that 
any increase in general knowledge due to collecting would have 
permanent effects, it should be mentioned that, all of our subjects 
being high-school students, none of those who had once collected had 
been inactive long enough for the length of time lapsed since collecting 
to seem important. The number of years active as a collector was 
correlated with intelligence and social studies scores. The product- 
moment coefficients follow: , 


Intelligence and social studies...................000005 r= .67 
Intelligence and number of years collecting............ r= —.09 
Number of years collecting and social studies.......... r= —.04 


Intelligence is seen to be related to scores on the achievement test, 
but not to number of years of collecting experience, nor is the last 
factor related to world knowledge. Holding intelligence constant, 
years of collecting and social studies scores have a partial correlation 
coefficient of .03; there is, clearly, no relationship between the number 
of years a person has collected postage stamps and his knowledge of 
world affairs. 

There is still another possibility; namely, that the Codéperative 
Test covers one kind of social knowledge, whereas stamp collecting 
results in another. We have already indicated that this test was 
selected as measuring significant social knowledge: The questions deal 
largely with understanding of broader social issues and principles, 
questions of fact are less common. Now, it is just this latter type of 
knowledge which one would expect to result from collecting stamps: 
Capitals, boundaries, location of countries, historical dates. Scarcity 
of items of this type in the Coéperative Test makes impossible a final 
answer to this question, but the data have been analyzed for what 
they contain. 

Three questions were selected by three judges trained in educational 
psychology as being primarily factual and of the type on which stamp 
collectors should do better if the above assumption were true. These 
three questions (1, 2, 4) occur in the Geography section and relate to 
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location and size of countries; that the number is no larger is due to 
the fact that no other items appeared to meet the criteria. The per- 
centages of collectors and non-collectors correctly answering these 
questions were tabulated, and the significance of the differences cal- 
culated; the data follow: 











Per cent responding correctly 
Item D/ea 
Collectors Non-collectors 
1 73.42 69.77 42 
2 41.78 34.88 75 
4 72.18 69.77 .27 














For these three questions, at least, the differences are not signifi- 
cant, and the indication is that even in strictly factual, geographical, 
knowledge those who collect stamps are no better informed than 
others. This latter conclusion is tentative, however, and needs to be 
checked with a larger number of items. 

The findings reported above apply, of course, only to young people 
of high-school age. It is still quite possible that elementary-school 
children who collect stamps learn more factual geography, and that 
adult stamp collectors learn more about certain fields, depending on 
how they collect stamps. These are topics for separate study, but 
if the former possibility is true our study would seem to indicate that 
the non-collectors catch up with the collectors by the time they reach 
high school. It is possible, also, that the hobby has values not 
related to world knowledge, possibility not investigated here. 


SUMMARY 


Tests of intelligence and of achievement in social studies were 
administered to one hundred twenty-two high-school students, and 
their scores were related to information about their experience in 
stamp collecting. It was concluded that: 

(1) Collecting stamps does not result in increased knowledge of the 
social studies in high school; 

(2) Collecting stamps probably does not result in an increase in 
the factual knowledge of high-school students; 

(3) There is no relationship between mental ability and interest 
in stamp-collecting. 
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THE MOCK CLINIC AS A TEACHING DEVICE 


WILLIAM M. LEPLEY 
The Pennsylvania State College 


It is the purpose of this brief paper to describe an apparently 
effective technique used in the conduct of a class in psychopathology 
at the University of Colorado. The use of the clinic as a teaching 
device is not new. It has been used in medical schools for many years. 
The presentation of subject-matter by the method of case study is 
common as an educational practice. The device described combines 
certain characteristics of each of these and includes modifications that 
appear to contribute to effectiveness. 

The class population consisted of fourteen students. The fields of 
interest represented in this group included: Medicine, public-school 
teaching, college teaching, and professional nursing. In addition, 
there were undergraduate students without definitely formulated 
occupational interests. This class met for sixty minutes, five days a 
week for five weeks. 


THE PROCEDURE 


The first week was devoted to lectures by the instructor. The 
organization of these lectures was in no way unique. A conventional 
outline, which included the basic behavior mechanisms and their 
maladaptive manifestations, was followed. 

At the beginning of the second week the following instructions 
were given to the class: “On certain previously designated days from 
now on until the end of the term we are going to be clinicians and 
patients. At our last meeting this week we are going to hold a clinic. 
I shall be the clinician and you will be the patients. I have prepared 
fourteen slips of paper, one for each of you. On each slip is the tech- 
nical name of a conventionally classified pathological syndrome. 
Before the meeting of our clinic, each one of you is to make such prep- 
aration that you will be able to appear here at the meeting and represent 
with action a typical case of the disorder which has fallen to your lot. 
One by one you will be called before me and the rest of the class. I 
shall conduct the clinical examination and each of you will act the 
part of a patient. You must make your preparation secretly with 
regard to the rest of the class and to me. No one shall know the 
nature of your affliction until after you have performed here and after 
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we have independently diagnosed. This is important. Your grade 
in the course will not depend on your performance, nor will it depend 
upon your skill in diagnosis. . Grades will be computed on the basis 
of tests and quizzes which, presumably, will measure what you have 
learned in the course of these and other procedures. Your preparation 
for these performances might well follow this outline. First, read 
everything that you can find in your text that bears upon your par- 
ticular disorder. Second, go to the library and, in the appropriate 
journals, read actual case histories of patients of your type. Then, 
at last, be as original as you like and prepare to act the part of a patient 
which will faithfully represent your type.’”’ The slips were then dis- 
tributed and further instructions and advice were given as asked for. 

Following the procedure detailed above ten mock clinics were 
conducted. These were evenly distributed in time throughout the 
remainder of the term. The days on which clinics were not held were 
devoted to formal reports by the students. Though irrelevant here, 
it may be well to say that the topics were chosen by the students and 
in each case the report represented an intensive study of some con- 
ventionally classified syndrome. The only departure from the pro- 
cedure outlined above occurred when the students, one by one, took 
over the part of the examining clinician. This practice began with 
the third clinic and was in effect for the remainder of the term. A 
careful day-by-day record was kept of the individual student’s correct 


and incorrect diagnoses. A summary of this record is presented in 
Table I. 


TaBLE I].—PERCENTAGES OF CoRRECT DIAGNOSES IN SuccesstvE Mock CLuInIcs 
| 





Ist | 2nd | 3rd | 4th 























| 

Clinic sth | 6th | 7th | 8th | 9th | 10th 

Per cent.......... | 14.3) 28.6! 21.4 50.0 64.3) 85.7, 92.9 92.9]100.0|100.0 
INTERPRETATION 


The above statistics are submitted with an experimentalist’s 
apology. Certain it is that these values defy precise interpretation. 
They must represent resultants of several factors. The following 
factors are probably effective: skill in diagnosis, skill in mock represen- 
tation, information gained from formal reports, and degree of motiva- 
tion. No attempt can be made here to assign relative potencies. 
That must wait upon the conclusion of critical experiments. The 
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author’s evaluation of the mock clinic device is highly subjective and 
is based largely upon his casual observation of class morale. No one 
cut classes. Even the auditors came every day. Further, everyone 
had so much fun on clinic days that the classroom was strongly sus- 
pected of sheltering extremely unacademic procedures. It might be 
added that no grade below B was given in this class. 


CONCLUSION 


The author ventures to write the conclusion in the form of a 
prediction. He predicts that a carefully controlled parallel group 
experiment would reveal a relatively high degree of effectiveness for 
the mock clinic as a teaching device. 
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BOOK REVIEWS 


Rosert 8. Woopworts. Experimental Psychology. New York: 
Henry Holt and Co., 1938, pp. 889. 


Psychology as a science has undoubtedly suffered in its acceptance 
by other sciences because of its theoretical systematizations or 
“‘schools.” Systematic disagreements have characterized all young 
sciences so that psychology is not unique. However, any science 
purporting to be experimental should in the course of nearly eighty 
years accumulate a respectable body of experimentally verified 
knowledge. That psychology has done so is, of course, the opinion 
of all psychologists, however difficult it might be for an individual to 
prove it to his scientific colleagues. In his Experimental Psychology 
Woodworth has afforded a positive answer to the critics of psychology. 
This is probably the most significant psychological treatise of this 
quarter century. It can claim a scientific equality with the best of 
the general treatises in chemistry, physics, biology, or any other 
science. 

Woodworth wisely limits his field very definitely and disclaims 
any attempt to deal with abnormal, child, applied, social, or individual 
difference psychology. In terms of method he limits himself to those 
areas where study has been by means of controlled experiment and 
has omitted purely observational and correlational studies. He quite 
clearly points out that these fields and methods are of codérdinate 
importance with that which he has treated, but mere magnitude 
demanded strict delimitation. The specific major topics discussed in 
the order of the chapters are: Memory and learning, feeling and 
emotion, reaction time, association, experimental esthetics, psycho- 
logical methods, sensation and perception, attention, reading, problem- 
solving and thinking. When necessary for full understanding an 
historical approach has been used, but always the aim has been to 
present a full but concise statement of the present experimental status 
of the topics discussed. The book has been in progress some thirty 
years during which the author has maintained an intimate contact 
with the ever increasing literature. The extent of that contact is 
intimated in the bibliography which occupies thirty-four closely 
printed, double-column pages. 

A final word should indicate the book’s importance to the educa- 
tional psychologist. While the author specifically says that he has 
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not dealt with educational psychology, the list of subjects enumerated 

above clearly indicates that a large portion of the work is of direct 

concern to the specialist in this field. There can be no doubt that this 

is a ‘‘must”’ book for every psychologist’s library, whatever may be 

his own specific interest. C. M. Lovuttir. 
Indiana University. 


H. A. Imus, J. W. M. Rorunery, and R. M. Bear. An Evaluation of 
Visual Factors in Reading. Hanover: Dartmouth College Pub- 
lications, 1938, pp. 144. 


This is a preliminary report of a program to improve the reading 
ability of the Dartmouth College student body, mainly freshmen. 
Although the aim was instructional service to students, the mono- 
graph is essentially an experimental report. It includes a description 
of the visual survey, measurement of reading status, and the remedial 
program. These were, of course, codrdinated with the educational 
program. The visual survey included measurement of eye movements, 
a complete clinical eye examination, and determination of eye domi- 
nance. Standardized tests were employed for measuring reading 
proficiency and scholastic aptitude. 

Only a few of the many important findings of this study will be 
mentioned. The frequent occurrence of ocular defects among the 
subjects suggests the need of thorough eye examinations for all college 
students. No significant relationship was discovered between eye 
defects and (1) reading performance, (2) eye movements in reading or 
(3) scholastic achievement. Nor did correction of ocular defects 
result in increased reading ability and academic achievement. Read- 
ing achievement correlated .30 to .40 with academic standing. Reme- 
dial training produced significant improvement which was maintained 
for at least one year. - 

Perhaps the only weak part of the study is the evaluation of the 
eye-movement records. ‘These records were found to be unreliable 
and of low validity. These findings were undoubtedly due to the 
method of obtaining the records and the unsatisfactory criterion 
employed to establish validity. It has been shown by others that, 
with proper procedures, eye-movement records are highly reliable 
and valid. The reviewer, however, agrees with the statement that 
the eye-movement camera is of little value to the reading clinician. 

The authors deserve high praise for producing one of the most 
valuable studies on the subject. It is obvious that the experiment 
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was efficiently planned. Treatment of the data is complete and 

evaluation of the results thorough. The conclusions are conservative 

and sound. The report should be welcomed since it emphasizes some 

important cautions and tends to clear up several issues which pre- 

viously had been in doubt, partly because of bias and loose thinking 

on the subject. Miues A. TINKER. 
University of Minnesota. 


Ernest W. Tiecs. Tests and Measurements in the Improvement of 
Learning. Boston: Houghton-Mifflin Company, 1939, pp. xxr 
+ 490. 


This book is, according to the author, ‘‘an attempt to meet the 
measurement problems and needs of teachers on different levels, as 
well as to provide data for the solution of supervising and adminis- 
trative problems.” In part it is a revision, and in part a supplement 
to, the author’s earlier volume Tests and Measurements for Teachers. 

The thesis around which the book is organized is that measure- 
ment in education is for the purpose of appraising pupil work and 
diagnosing difficulties. Tests as ends in themselves or as devises for 
marking pupils have no useful place in modern education. No 
matter what educational philosophy to which one adheres he must 
admit that his purpose is to help children grow and change in relation 
to certain goals. Measurement of the sort described Dr. Tiegs is 
necessary “‘in the guidance of learning activities” directed toward 
those goals. Testing is not synonymous with measurement because 
the latter may involve many diagnostic procedures in which quan- 
tification, if any, is incidental. Throughout his discussion the 
author maintains a careful, common-sense point of view which should 
make the work invaluable for teachers and for teacher training. 

The first thirteen chapters are devoted to discussion of measure- 
ment procedures. Tests are mentioned but no space is wasted in 
describing them; rather the author succeeds in making clear the values 
of standard instruments for the teacher’s daily classroom problems. 
Diagnostic and appraisal methods are described, and then the applica- 
tion of these to primary, elementary, secondary, and higher education 
is demonstrated. Three chapters are devoted to ability grouping, 
curriculum, and teaching difficulties as these may be related to the 
measurement of pupil growth. Two chapters are devoted to statis- 
tical procedures and three to administrative problems related to 
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measurement. In the last chapter there is an interesting and useful 
innovation of presenting questions frequently asked by school people 
and the author’s answers to them. 

The summary of the contents given in the last paragraph indicates 
the wide area covered. As might be expected no subject is extensively 
treated. This must not be taken to mean that the book is shallow or 
scientifically unsound. As a general textbook for courses in educa- 
tional measurements it should be eminently satisfactory. As a book 
for the individual teacher’s guidance it is superior to most of the 
available works in the field. C. M. Lovurtir. 

Indiana University. 


A. R. Grutmanp and E. L. Cuarx. Psychology of Individual Differ- 
ences. New York: Prentice-Hall, 1939, pp. 535. 


The authors’ statements that their purpose is to collect together 
the more important facts and conclusions in the field of individual 
differences and that they have avoided complete changes in inter- 
pretation of the works of others characterize the book fairly well. 
The outline of the text follows pretty much the materials that are 
traditionally accepted as belonging in the field. The discussions of 
differences in physical characteristics, sex differences, and race differ- 
ences deserve special commendation. Large emphasis is given to 
differences in personality which is the longest chapter in the text. 
Pertinent lengthy sections are devoted to the bearing of individual 
differences on learning and teaching, and on applications in business 
and industry. Each chapter is followed by questions and references. 

The interested reader will find points with which to disagree and 
things to criticize. The tendency to accept certain results without 
critical evaluation occurs too frequently, especially since the validity 
of some of the material is being seriously questioned. After citing 
material from Wellman’s reports the authors state, ‘‘These studies 
certainly show that a superior environment improves the IQ, and it is 
safe to say that it improves general intellectual standing as well.’’ 
Differences in the emphasis devoted to experiments on similar prob- 
lems might be interpreted by some as biased. For example, almost 
four pages are devoted to Freeman’s study of foster children and 
only one page is given to Burks’ report. On page 62, the reference to 
Fig. 26 is obviously meant to be Fig. 27. There are ample data show- 
ing that about eight per cent of men are color blind rather than four 
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per cent as given in the text. The interpretation of material in Table 
XVII is obviously in error. Seashore musical test scores are given for 
Negroes and whites and the authors state that ‘‘the Negro is superior 
to whites in musical ability, as measured by the Seashore tests.”’ 
Of fifteen pairs of scores given, the Negroes are superior on only five. 
When composite scores are considered, the fifth-grade Negroes are 
0.6 of a point better than whites. For the eighth grade and for 
adults, the whites are definitely superior. 

Although a reviewer may criticize several things in the book, the 
worker in individual differences will find much useful material in 
this text. The authors have done a commendable job with subject- 
matter that is difficult and with material, much of which is still 
controversial. Miuss A. TINKER. 

University of Minnesota. 


LAvuRETTA BENDER. A Visual Motor Gestalt Test and Its Clinical Use. 
New York: American Orthopsychiatric Association, Res. Monog. 
No. 3, 1938, pp. 176. 


This book is principally concerned with the presentation of clinical 
material derived from the use of a visual motor gestalt test given to 
children and to different types of abnormal patients. The test consists 
of nine of Wertheimer’s original patterns which the subject copies. 
Standardization of the test revealed a good separation in mean score 
from age to age. In the clinical situation, however, the qualitative 
deviations seem most important. Reliability of this subjectively 
scored test is not discussed. 

After showing how gestalten arise genetically in children, the author 
compares these gestalten with the spatial orientation of mental defec- 
tives and primitive children. Examination of optical imagery and 
tachistoscopic phenomena emphasizes the motility of the visual field 
in relation to spatial relationships. The test results for patients 
class. fied as sensory aphasia, dementia paralytica, psychoses, psycho- 
neuroses and mental defectives revealed various disorganizations of 
gestalten, reversion to primitive features of perceiving, incompletely 
perceived gestalten and motor difficulties. 

This report presents a very interesting approach to clinical abnor- 
mal psychology. Many will raise questions, however, concerning the 
usefulness of this visual motor gestalt test. It was standardized on 
normal subjects. The scoring is highly subjective. As yet we know 
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nothing about its reliability. Obviously one must have a compre- 
hensive knowledge of gestalt psychology and be sympathetic to the 
gestalt approach in order to obtain any value from use of the test. 
Mies A. TINKER. 
University of Minnesota. 


MARGUERITE W. JoHNsON. Verbal influences on children’s behavior. 
University of Michigan Monographs in Education, No. 1, 1939, 
pp. 191. 


Our culture is predominantly a verbal one, therefore, parents and 
teachers are continually giving children directions in this manner. 
While the mere vocabulary used in such directions may be quite 
constant there are always variations in rhetoric, intonation, and the 
subtleties of manner. Common sense suggests that such variations 
as those just listed would have significance for the child’s reponse. 
Dr. Johnson has put this to an interesting experimental test reported 
in this monograph. 

Eighty-four children between the ages of two and one-half and 
eight and one-half years from the University Elementary School were 
paired into an experimental and a control group. During three school 
years four adult experimenters presented to each child a series of 
seventy-five situations requiring different types of activity; the verbal 
directions for each situation were in direct contrast, e.g., simple vs. 
verbose, as given to the two groups. Record was kept of the per- 
centage of children in each group following the directions, their 
verbalizations, and in some instances records of time or accuracy. 
Results from each of the seventy-five situations are given in detail too 
great to be abstracted here. The final paragraph from the mono- 
graph aptly summarizes the findings. 


In general, greater specific and general compliance resulted in the experi- 
mentation with the more positive, specific, simple, direct, pleasant, hopeful, 
approving, and unhurried requests and prohibitions than with the more gen- 
eral, verbose, choice, depriving, disapproving, hurried, negative, threatening, 
scolding requests and prohibitions. 


This monograph should be stimulating to further research of this 
nature and also to parents and teachers in suggesting efficiency in 
their own verbal contacts with children. 

Indiana University. 


C. M. Lovrttirt. 
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L. W. Crarts, T. C. Scunerua, E. E. Roprnson, and R. W. Gruserr. 
Recent experiments in psychology. New York: McGraw-Hill Book 
Company, 1938, pp. 417. 


It is an interesting commentary on elementary psychology texts 
that there appears to be a steady demand for supplementary books 
of readings. The work being reviewed is such a supplementary text, 
but it has characteristics which put it in the forefront of all such books. 

The authors have selected twenty-eight topics of importance in 
current psychology including such general fields as learning, emotion, 
functions of the nervous system, native behavior, sensation, and 
individual differences, to each of which one or more chapters are 
devoted. The discussion of each topic is based upon a specific research 
or in some cases several research reports. In each instance the authors 
have written an introduction orienting the topic in psychology and 
have then rewritten the report of the experiments so that the pro- 
cedures, results, and conclusions are simplified. While the condensed 
reports are pitched to the level of the college sophomore, there is no 
sacrifice of scientific accuracy. This book can be highly recommended 
as a supplementary source for courses in elementary psychology. 

C. M. Lovurttir. 





Indiana University. 

















